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(54) Method and apparatus for processing text and character data 



(57) An appamti/s and method for processing text 
or character data are disclosed. A text processing sys- 
tem receives a character input string and determines 
wtYether to apply character processing. A non- English 
language such as Italian can be entered into a process- 
ing system auch as a computer using a standard Eng- 
lish based keyboard such that additional keys lor 
providing accents or other grammatical and punctuation, 
symbols or characters not existing in EngQsh are not 
required. In one mode» text is automatically accented or 
punctuated without requiring user Intervention. In 
another mode, a user is provided with a list of accent or 
punctuation chobes so that the user may select Ihe 

r 



optimum accent or punctuation. Text processing of an 
Input may be activated by a predefined activator event 
pressed in a predetermined sequence, or may be acti- 
vated in the event a predetermined sequence of charac- 
ters is received. When an activator event input is 
detected, a rules based system Is utilized to select a 
correctly accented and punctuated character. A fist of 
altemative accents and punctuations is optionally dis- 
played, and a user nnay toggle through the list using the 
activator event to select a desired character. The display 
provides information for a level of certainty of a selected 
character or word. 
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Description 

FIELD OF THE INVENTION 

5 [0001 ] TTie present Invention generally relates to the field of Information processing, and particularly to a character 
processing system. 

BACKGROUND OF THE INVENTION 

10 [0002] The advent of computer technology has revolutionized the way In which people around the world communi- 
cate. One area in which computer technology has provided change is in word and text processing applications. The first 
typewritere and computer tennlnals, which still set standards for text keyboard layouts, such as the "QWERTY" and 
'Dvorak* configurations, and for oompiiter text encoding Including the American Standard Code for information Inter- 
change (ASCII) and the Extended Binary Coded Decimal Interchange Code (EBCDIC), were invented and widely used 

15 In the United States, whteh continues to be the primary marHet for the Introduction of such devbes, and In which English 
is the ofRciai language. English is also both the most popular second language, bs well as the second most popular 
mother language In the world. Written English uses the Roman alphabet with no diacritical marks (26 characters in 
upper and k)wer case: A,B,QD^E,F,G,H,l,J,KL,M,N,0,P,Q,R,S,XUV,W,KY£Lr\(i Z). Most other languages 
that use the Ronnan alphabet use an extended verston of such alphabet, where dlacritk:al marks such as accents and 

go umlauts, lor example A A, A or A, are combined with certain alphabetical characters that are also used in English such 
as A, The characters that are present on keyboards designed for the English language are also preset in most key- 
boards designed for other languages, whereas the additional non-English characters vary widely from keyboard design 
to keyboard design, depending on the target languages (e.g., Gemnan, French, Italian, etc.). In a similarly limiting way, 
the first definitions of computer chatBcter sets, which specify how each character is to be stored In computer memory, 

25 did not assign codes to letters other than the 26 upper case arKi 26 k>wer case letters used In English. The most Impor- 
tant of these first character sets, which are still in use today, are ASCII, where 7 bits out of 8 are used to store infomrra- 
tion, and EBCDIC, which uses 8 bits of data, and is based on IBM's earlier BCD encoding. In the ASCII set. the upper 
range of 128 codes having the 8^ bit set was left undefined and unused. SMtarly, in EBCDIC, certain blocks of codes 
were left unused. Over the years, both character sets have been extended in order to store certain non-English letters, 

30 either by replacing certain non-alphabetteal characters with non-English alphabetk»] ones, or by assigning some 
codes, whk:h had originally been left undefined. As newer character sets were defined, these in general maintained 
backward compatibility with either ASCII or EBCDIC. Even newer IG-bit and 32-bit global character encoding schemes 
(eg., Unicode) retain, for compatibility, the original subset of 7-blt ASCII codes. This Illustrates how, both for the layout 
of text Input keytx>ards, as well as for character encoding definitions, there is a subset of characters whk;h is in large 

35 part both privileged and standard. This subset includes the 26 letters from A to Z, in upper and in lower case (a total of 
52 alphabetical letters), the 10 digits, as well as certain spacing and punctuation signs, and other signs such as the 
apostrc^he (ASCII decknal code 39), and the 'grave" character (ASCII decimal code 96), which is very similar to the 
apostrophe. Neither the original ASCII nor the original EBCDIC character encoding set provide support for letters used 
in non-English languages such as Italian, This means that on systems that employ these character sets there is no 

40 accepted standard for encoding, for example all the accented letters used In Italian. Thus there lies a need for a text 
processing system that allows the accents and punctuation of a non-English language to be processed by an English 
based system using standard English based input devk:es such as a QWERTY keyboard. 

SUMMARY OF THE INVENTIOM 

45 

[0003] The present Invention is directed to an apparatus for processing character or text Input In one embodiment, 
the apparatus includes means for receiving an input, means tor determining whether to execute character processing 
on the input, means for executing character processing on the input whereby an output Is produced representative of 
the character processed Input, and means for providing the output to an output system. 

so [0004] The present Invention is further directed to a method for processing character or text input In one embodi- 
ment the method Includes steps for receiving an input determining whether to process the input according to a prede- 
termined character processing rule, in the event it Is determined to process the Input, processing the input according to 
a predetermined character processing rule whereby and output representative of the processed input is produced, and 
providing the output to an output system. 

s$ [0005] The present invention Is directed in one embodiment to a character encoding and decoding method that 
allows accented letters to be stored using a standard unmodified character set, such as 7-blt ASCII. The encoding 
method of the present Invention can be applied to a stream of data originating either from a file or from keyboard Input 
events, as well as from other sources. The basic encoding method can be extended to detect and comect different types 
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of errors in the Input text, as well as to give total control to the user, to handle, for example, exceptions as welt as delib* 
eraie errors. 

[0006] A text encoding method whereby certain combinations of characters In a stream of text Input events are con- 
verted to other characters tn the output stream, in consideration of the available Input device, the Input and output char- 
5 ecter sets, text context, language rules, character Input timing Infomriation, end custom parameters. Several ws^ to 
interface with the host environment are considered. Custom parameters, both to configure the operation of the Inven- 
tion, as well as to update the language rules and the database of character sets, can be entered by means of a dedl* 
cated Interface, or by entering appropriate data into the input stream. 

[0007] The present Invention provides a method end apparatus for encoding diacritical marks, apostrophes and 
10 other word-related signs, optionally correcting any en-ors that are found. The enor management part of the invention 
provides automatic error correction of accents, apostrophes and other signs used by the encoding techniques 
described here according to proper grammaticai rules. In one embodiment the trwentlon may be utilized with languages 
wherein hints and activator event sequences provided In the Input stream, &g., by the user, are, alone, not sufficient to 
define a character In an unannblguous and en-or-free way. Accent encoding lirhltBtions are common both to i(eyboards 
IS and to character set codes, and both can be treated as the source of a text input stream with the present Invention. The 
present Invention Is also capable of being applied In one embodiment to overcome the limitations of both Keyt)oafd input 
data, and text file data, as well as other, similar text streams. In an embodiment wherein the invention is utilized In real 
time, the present Invention eases typing of text In Italian and In languages with similar properties, making It possltrie to 
reduce the number of keys on a keyfc)oard normally required for typing text in such languages, as well as allowing for a 

29 keyboard not specifically designed for such languages to be used, and virtually eliminating effors Involving dlacrttteal 
marks, while providing for simple handling of exceptions. For Itafian, an embodiment of the invention specifies different 
types of logic that can be applied to resolve specific ambiguities and errors typk^al of Italian writing. This Invention can 
also be very useful for German, Spanish, and other languages in whbh such logic is not necessary, for example, 
because hints and acth^tor event sequences present In the text Input stream ere sufficient to unambiguously define a 

s$ character, but, for reasons such as the lack of certain national characters In the keyboard or character set, a simpler 
way to Input national characters than the methods currently h use Is desimble. Additionally, the present invention pro- 
vides for different ways to easily program and Input characters that may not yet be encoded on a keyboard or character 
set such as, for example, the symbol for the euro currency, 

[0008] In one emtxKfiment the present invention provides a simpler set of rules that can be Implemented m real 

30 time even on the slower systems. In alternative embodiment a more complex set of rules may be implemented provid- 
ing more options for more powerful and professional systems. The present Invention, In one embodiment provides for 
the encoding, decoding and editing of text in Italian and similar languages using standard 7 bit ASCII character codes, 
thereby reducing text complexity and storage requirements compared to encoding methods which employ 8 or more 
bits of Information per character. The present method provides for the automatk: correction and processing of text 

3S streams employing 7. 8 or more bits of signifbant character Information by automatically recognizing factors such as 
the character encoding set and the language of the text, and ^propriately applying the encoding method. The method 
Is capable of nomializlng text to a standard format so that it can more effectively be Indexed or used for comparisons 
and searches b applfcatlons such as Internet search engines, or the search funcfions In word processing and database 
applications. 

40 [0009] It is to be understood that both the foregorg general description and the following detailed description are 
exemplary and explanatory only and are not restrkrtive of the Invention as clainr^d. 

[0010] The acconrpanying drawings, which are Incorporated in and constitute a part of the specifrcatlon, illustrate 
an embodoment of the invention and together with the general desertion, serve to explain the principles of the inven- 
tion. 

45 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0011] The numerous objects and advantages of the present Invention may be better understood by those sldlied 
in the art by reference to the accompanying figures In which: 

so 

FIG. 1 is a block diagram of an overall system level embodiment of the present invention; 
FIG, 2 is a btock diagram of a computer system capable of tangibly embodying the present Invention; 
FIG. 3 Is a flow diagram of a method for processing text input In accordance w^h the present Invention; 
FIG. 4 is a fiow diagram of a method for processing text in accordance with the present invention; 
55 FIG. 8 is a flow diagram of a method lor processing text in accordance with the present invention; and 
FIG. 6 is a fiow diagram of a method fbr processing text In accordance with the present Invention. 
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DETAILED DESCRIPTION OF THE INVENTION 

[0012] Reference will now be made in detail to one or more embodlmente of the invention, examples of which is 
Illustrated In the accompanying drawings. 

5 [0013] Refening now to FIG. 1 , a block diagram of an overall system embodiment of the present Invention will be 
discussed. System 100 Includes a text Input system 1 10 that indudes at least one or more of several means by which 
text or character data may be provided as input to processing system 126. Text input system 110 may comprise, lor 
example, a keyboard 1 1 2 with which a user is able to manually enter or type text orcharficters to provide a text or char- 
acter Input stream, or a file 11 4 in which text or characters are stored in a format that is capable of being read, inter- 

to preted and processed by processing system 126. Furthennore, input system 110 may Include a microphone 116 
coupled to a speech-to-text engine 1 18 such that words or utterances spoken by a user are processed Into a text or 
character stream that Is capable of being Interpreted and processed by processing system 1 26. Addlttonally, Input sys- 
tem 1 1 0 may Include a graphk:al image file 1 20 generated by optbally scanning a text document that Is then processed 
by an cptk:at character reader that is capable of producing text or characters that ere capable of being hterpreted and 

15 processed by processing system 1 26. 

[0014] Processing system 1 26 may be any type of system that is capable of processing and ecming text or character 
input. In one ennbodHnent, for example, processing system 126 includes an operating system 130 for controlling an 
appHcation 1 30 that is capable of processing and editing a text or character stream provided to processing system 1 26. 
For example, application 1 30 may be a standard word processor such as MICROSOFT WORD for mnning under oper- 

20 alfng system 1 28 that may be, for example. MICROSOFT WINDOWS 08, MICROSOFT WINDOWS NT. MICROSOFT 
WINDOWS ME. or MICROSOFT WINDOWS 2000, all of whk^h being available from Mk^rosoft Corporation of Red- 
mond, Washin gton. As text or character data is processed by applicatton 1 30. operating system 126 is capable of caus- 
ing the resulting output of application 130 to be provkfed to output system 132. Output system 132 may Include, for 
example, display 134 for displaying the output of appHcatton 130 in a format readable by a ^ewer, file 136 for storing 

25 the output of application 136 for later retrieval by operating system 126. or a storage database 130 wherein the output 
is stored in a format readable by other applications or by ctlier computer systems. 

[0015] In operation of the present Invention, a text Interpreter 124 receives an Incoming text or character stream 
provided by input system 110 and processes the text or character stream in accordance with predetermined text 
processing rules. Text interpreter 124 may be tangibly embodied, for example, as astand-alone hardware or firmware 

30 de\^e connected between input system 110 and processing system 124. Alternatively, text Interpreter 124 may be 
directly Incorporated Into one or more Input devtoes 1 12-122 as hardware, firmware, software, or a combinatton thereof. 
In a further alternative embodiment, text Interpreter 124 may be Incorporated in processing system 126 as a hardware 
device, as firmware, as software or as a combination thereof. For example, text Interpreter 124 may be Included as a 
portion or subroutine of operating system 128 or application 130. Alternatively, text Interpreter 124 may itsetf be a stand- 

55 alone appiteation that is capable of provkiing an output directly to output system 132 or that Is capable of being read 
and rnterprete<f by application 130. In a further alternative embodiment, text interpreter 124 is capable of operating 
simultaneously and In conjunctjon with appDcation 1 30. Thus, one having skill in the art would appreciate that the place- 
ment of text Interpreter between text Input system 110 and processing system 126 Is for example purposes and need 
not be limited to the position illustrated In FIG. 1 . As alternative embodiments, text interpreter 124 may be Incorporated 

40 within text Input system 110, for example being built into keyboard 1 12, or may be incorporated into processing system 
126, for example as part of either operaUng system 128, epplicatton 130, or as a self contained hardware device, 
firmware, or routine or process rtmnlng on processing system 126. 

[0016] Referring now to FIG. 2, a block diagram of a computer system that is capable of tangibly embodying the 
present Invention wilt be discussed. Computer system 200 Is capable of Implementing, at least In part or in whole, text 

45 processing system 1 00, or any portion thereof, as discussed with respect to FIG. 1 . Computer system 200 includes a 
processor 200 for processing distal data. Processor 200 may comprise, for example, a complex Instruction set comput- 
ing (CISC) microprocessor, a reduced instructton set computing (RISC) mtoroprocessor, a very long instruction word 
(VLIW) mteroproceseor, a digital signal processor (DSP), a combination of processors, or the like. A bus 224 couples to 
processor 21 0 for transmitting signals between processor 210 and other components, systems, or devtees of computer 

so system 200. A read-only memory (ROM) 212 is coupled to bus 224 for storing information that is Intended not to be 
rewritten, or only rewritten Infrequently. A random access memory (RAM) 21 6 couples to bus for storing information that 
can be dynamically written or read by processor 21 0. ROM 212 includes a basic Input-output system (BIOS) routines 
for Inrtializing computer system 200 and loading operating system (OS) 21 8 Into RAM 216 at startup, and for facintatlng 
the transfer of information among the devices of computer system 200. Operating system 218 may be loaded from a 

55 hard disk drive 232 coupled bus 224 via hard disk drive controller 230 In which case operating system 234 Is the same 
as operating system 218. Likewise, RAM 220 may store one or more programs 220 and one or more files 222 that may 
be loaded from hani disk drive 232 in which case program 236 and file 238 are the same as program 220 and file 222, 
respectively. A display adapter 226 couples to bus 224 for displaying a video signal received via bus 224 on display 228. 
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Corr^uter system 200 may Inctude one or more removable stoiage medium device controllers 240 for controlling one 
or more removable storage medium drives 242 that is capable of reading from or reading from and writing to a remov- 
aWo storage medium 224 on which a program 246 or a file 248 may be stored. For example, removable medium may 
inciude. but is not llmtted to, a compact disk read-oniy memory (CO-ROM) or a writable CD-R0iy4, a floppy disk, an optl- 

s cal disk, an optical-floppy disk, a digital versatile disk (DVD or DVD ROM) or a writable DVD. laser disk, magnetic tape 
<e.g.« reel or cassette), removable hard disk drive, semiconductor memory (e.g., flash memory card or memory stick), 
or the like. An Input/output (I/O) controller 250 Is coupled to bus for connecting computer system 200 to one or nx)re 
input* output, or input/output devbes such as modem 252» I/O devk>e 254. mouse or graphcal user Interface (GUI) 
devk:e 256. keyboard/keypad 258 or the like. I/O controller 250 may provide one or mora ports such as a serial port, 

10 parallel port. Universal Serial Bus (USB) port, or the like. I/O devfce 254 may include any one or more I/O device such 
as a touch screen input device laid over display 228 for operating as a GUI device In conlunctk>n with a GUI based oper- 
ating system. Real-time clock 260 provides one or more timing signals for synchronizing the operation of the devices of 
computer system 200. A network adapter 262 is capable of coupHng connputer system 200 to a remote system 266 via 
network 264 such as a local area network (LAN) or Intranet Likewise, modem 252 is capat»le of coupling computer sys- 

15 tern 200 to a renrwle system 266 via a remote network such as a wide area network 268 or e world-wide network such 
as the Internet Renrwte system 266 may be coupled to a storage medium 270 on which a program 272 or file 274 is 
stored that may be transferred from remote system 266 via network 264 or remote network 266 to computer system 200 
and stored, for example in RAM 21 6. hard disk drive 232 or rennovable medium 244. In one embodiment, ramputer sys- 
tem 200 and remote system 266 may hiplement a client-server arrangement in whfeh the processing of an applk^tron 

20 rnay be divided between one of computer system 200 or remote system 266 and the other. Computer system 200 may 
be a client arKi remote system 266 may be a server, or vice-versa. Text interpreter 124 of FIG. 1 may be implemented 
with computer system 200 as a progmm of instructions executable by processor 21 0, or may be implemented as an I/O 
devk:e 254 coupled in-line with another input devk» (e.g., keyboard/keypad 258) or operating in parallel therewith. 
7] In one embodiment of the present invention, text system 1 00 Is capable of processing an incoming text string 

25 according to the niles of the Italian language using a standard input device or system such as a standard, English 
based keyboard. Although the present invention is pardcularly directed to the Itailan language for example and discus- 
sion purposes, one having skill in the art wouki appreciate that the teachings of the present Invention may be applied 
to nrmny other languages. Including but not limited to French and GemDanic languages. It Is not intended that the present 
invention be limited to Italian or any other specific language. 

30 

Functions of The Invention 

[0018] In accordance with the present invention, system 1 00 is capable of processing a stream of file of text data. 
The data can for example be keyboard data as it is typed (connecting to the operating system as a keyboard hook, or 

as through interfiaces for input method edttors, or through interfaces for assistive technologies, or phystoally connecting to 
the keytxmrd hardware, etc.), or data being read from an existing file, or data being accessed through a standard inter- 
face provided by pmgrams like MICROSOFT WORD, or computer clipboard data (whfch the user has copied there, sys- 
tem 100 processes, and is then ready for being pasted back). A hook is defined in at least one embodiment of the 
invention as a location in a routine or program in which the programmer can connect or insert other routines for the pur- 

40 pose of enhancing functionality. A keyboard hook Is defined as a hook routine or program that knplements the connec- 
tnn or Insertion of routines using k^oard input. System 100 has access to all Input data, and It can also effact the 
output data to apply certain changes, vrtiteh are the objective of this invention. How this is accomplished is a function of 
the implementation. For example, if system 100 is implemented In the same program that writes the data, for example 
a file processor or a word processor, then system 1 00 can directly write the processed output data, modified as neces- 

45 sary. H however system 100 is inr^lenrtented as a keyboard hook, especially In an interactive context where the user 
expects to immediately see every character typed, either In sofhvare by inserting itself in the operating system's or the 
application's input stream, or as hardware, e.g. as a devk:e plugged between keyboard and computer, then it may 
change the output data by simulating the input of appropriate backspace or cursor movement characters, followed by 
new output data, to change data that already resulted in screen display. Even when system 1 00 Is passive, i.e.. it does 

so nothing to actually modify the text, it Is busy collecting context information, i.e., it maintains a local buffer of all recent 
input. This is necessary to know the full word that Is cun-ently being written, and also optionally to understand the con- 
text In whk:h the last word or character appears, for example to Identify an apostrophe character that can be expected 
to be part of a closing quote because system 100 has previously recognized an opening quote, so that It is not confused 
with an apostrophe that may have some other meaning that woukj affect the operation o1 this system. If system 100 is 

55 Inriplemented In a vray that it has direct access to text context information, for example as part of a word processor, or 
through an interface to a word processor that gives such access, e.g., MICROSOFT WORD scripting Interface, or as a 
file processor that only deals with a continuous input stream, then context Information can also be acquired directly on 
the text data itself, without needing to keep a copy of the recent data in a local buffer. Tiiere are however cases In which 



5 



EP1093 058A1 



the text Input stream is not linear, or It can be disrupted, for exarr^e when system 100 is Implemented as a keyboard 
hook, and the user nnoves the Cursor Up or Sown keys, or the mouse, to reposition the cursor. These cases can be 
detected (by detecting keyboard, mouse and other input events that affect the position o1 the text Input position), but it 
Is not always possible to reconstruct the new local context Information (e.g., system 100 does not know where the cur- 

5 sor is, after a Cursor Up), in some cases the new context can be reconstiucted by on-screen character recognitton. 
[0019) System 100 is character-oriented, i.e., it becomes active when certain characters are encountered in the 
Input stream. Implemented in a keyboard input context, it reacts to certain keyboard keys. No special conversion keys 
are necessary. Rather, system 1 00 uses context infonnation to give special meaning to an otherwise poss&>ly standard 
(because It may also appear In the text) input character. In the algorithms of this system, context Information Is com- 

10 btned with the most recent input character, and also, optionally, in a dynamic way to the number of times the last Input 
character occurs In a row, resulting. In the case of keyboard input. In a dynamic sequence. 

[0020] System 1 00 described herein provides the ability to affect not only the cun^ent character, but also previous 
charBctem. System 100 described here implements 'smart" procedures to process the combined context end input 
data, and generate output data tn a way that results In new, reliat)le, intuitive and extremely useful text input methods 

15 whfch have practical applications In Italian, German, French and other languages whk^h use Latin characters plus dla- 
critkml marks (but also to generate some special non^Latin characters). The present system is on one embodiment 
focused on the input of cert£Bn characters while the single characters are being written, and in partkxitar Itaftan 
accented vowels, but also characters with dtacrftk»l marks in other languages, and also certain non-word characters 
(currency symbols, etc.). Only in certain cases does system 1 00 take action at the end of a word to re-correct or further 

ao modify a previous mkf-word correction. This may happen for example when system 100 detects that an apostrophe 
originally Interpreted to indk^te an Italian accent was Instead meant to be an English possesshre which can only be rec- 
ognized after a non-word character follows an V which follows an apostrophe, in general, system 100 Intervenes In real 
time on each character. The definition of e word herein encompasses any word, punctuated or unpunctuated, accented 
or unaccented, contracted or unoontracted, with or without liaison, or any letter, portion, character, or subcombinatk>n 

25 thereof. 

[0021] In one embodiment of the Inventton adapted to the Italian language, when writing or othenivise inputting text 
to any application, an activator event Is used, whfch for the case of Italian the apostrophe character. During character 
Input, a correct or optima! accenting of a word Is provided upon a first encounter or entry of the apostrophe character, 
for example when the apostrophe key Is actuated during typing. If an alternative accented word form Is desired, for 

30 example while inputting a French word during writing of Italian text, an additional encounter or actuation of the apostro- 
phe character will select an aJtemativeV accented word. The apostmphe character may further be utilized In various 
ways to ovenlde automatk: actions provided by system 100, end with automata detection and re<orrectlon or further 
modlflcatton of's " possessive word fbnm In English. The software of system 100 works in one embodiment by func- 
tioning as an add-on on the keyt>oard input stream, but can also be directly embedded in text editing software, in the 

ss operating system, and on text systems of hand held devk^es, for example. II should be noted that an activator event msy 
be Indk^ated by one or more events or one or more sequences or combination of characters, input events, keyt>oard 
actuations, etc., so that the lenm acth^ator event may be defined as encompassing these several events and sequences. 
For example, an activator event Is defined In one enr4)0dlment as any key or character on a standard keyboard or In a 
standard character set (e.g., 7^rt ASCII). In one particular embodiment of the Invention, the apostrophe character, or 

40 the apostrophe key. Is defined as an activator event Upon an activator event, such as the actuation of the apostrophe 
key or the Input of an apostrophe character will cause system 100 to detect an activator event and provide an appropri- 
ate response, e.g., modification of a word himedlately preceding the activator event, initiation of an IME loop as 
described herein, etc. In another embodiment, an activator event Is defined as an activator event that is preceded by 
another character that, when appearing or occurring in combination, result in system 100 detecting an activator event. 

45 For example, an apostroph e character preceded by any vowel character Is detected by system 1 00 as an activator event 
or an activator event so that system 100 provides an appropriate word modlfrcatbn or other response, tn another 
embodiment, an activator event Is defined as two characters appearing or Input in succession, for example two vowels 
appearing in succession result in system 100 detecting an activator event so that an appropriate modifteation of the 
word or other response is provided, in one partk^ular embodiment, an activator event or at least one character of an acti- 

50 vator event Is keyed or othanvise Input in succession wherein each successive actuation or Input of the activator event 
causes system 100 to Initiate an additional appropriate response. For example, a vowel followed by a single apostrophe 
character or event causes system 100 to modify the word so that a first accented form of the vowel Is provided, e.g., 
using a grave accent. An additional input or actuation of an apostrophe causes system 100 to provide a second 
accented form of the vowel, e.g., using an acute accent. A yet additional input or actuation of an apostrophe causes 

55 system 100 to provide a third accented form of the vowel, e.g., using a circumflex accent, and so on. System 100 may 
continue to provide additional accented fomis until an entire list of accented ioms is provided. At the end of the fist, in 
one embodiment, system 100 again provides the first accented form of the word or vowel so that the list is effectlvety 
circular, or closed, and opttonally including an unmodified fonnn of the word or vowel, with or without the ending apos- 
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irophe activator event In an Bfternatlve embodiment, the list is open so that at the end of the list, system 100 provides 
the original, unmodified form of the word or vowel, with or wrthout the encflng apostrophe activator event and system 
1 00 does not go through the list an additional time. In an alternative embodiment, system 1 00 detects an activator event 
when an activator event or Input event lasts for a predetemfilned duration of time. In combination with another character, 

5 or attemetively Independently of other characters. For example, when an apostrophe character Is input, but Is actuated 
or keyed for a duration less than the predetermined duration of time, system 1 00 does not detect an activator event, and 
no modification of the word or other additional processing is provided. On the other hand, when an apostrophe charac- 
ter is actuated for a tkne at least equal to or greater than the predetermined period of time, system 1 00 detects an acti- 
vator event and provides an appropriate modification or other response. In a partlcufar embodiment of the invention. 

10 when an activator event is input for additional periods of the predetenmined duration of Wme, each additional period 
causes system 100 to detect an additional activator event and to provide an additional modification or other response 
in a manner similar to that of where an activator event Is activated and detected several times in succession as dis- 
cussed, above. For example, if a currency character is defined as an activator event, then when a user holds down a 
representative currency character key for a duration of time, after a firat period a first cunnency symbol Is provide, after 

15 a second period a second cunency symbol Is provided, and so on, in either a dosed or an open loop, unUf a dtolred 
cun^ncy character is provkied at which tvrte the user may release actuation of the acthrator event so that the currently 
provided cun-ency symbol Is maintained. Thus, an activator event may be defined to enconrtpass a key actuatior% singly 
or in a combination, a key actuation maintained (e.g., pressed) for a predetermined duration, a character In an input 
stream or text tils, singly or In combinatbn with other characters, and so on. An activator event therefore encompasses 

20 any one or more of the following events, alone or in combination: the same key pressed twtoe, the same character 
encountered twice, a predetemnined key, a predetenmined character, a predetermined key or character preceded by at 
least one or mora other predetermined keys or characters, or alternatively succeeding the predetermined key or char- 
acter. a predetermined key heid down or otherwise actuated or maintained for a predetermined duratK>n, opttonally 
being preceded by another predetennined key or character, an accented key or character, a vowel key or character, an 

2s accented vowel or character, and so on. Thus an activator event encompasses input data end input events. Any one or 
more of activator events combinations as descrSied herein, or variattons and combinations thereof, or in addition to 
those described herein, may be recognized and detected by system 100 without provUIng substantial change to the 
present invention. Any one or more of the activator events or combinations as described herein may be optionally 
applied to any one or more of the embodrments or language implementations of the Invention described herein or sim- 

50 liar to those described herein without providing substantial change thereto. 

[0022] Using different variants and combinations, the two aspects of the Invention are: Using a key or character, an 
activator event, as part of an interactive, dynamk: Input method editoi^ system, to handle accents in foreign words, and 
to otherAflse write any combination of accents and special charactera as desired. Each time an activator event is actu- 
ated In relation to a specific vowel or other context, system 1 00 generates a new character or character combination, In 

35 a loop. The order or hierarchy in whk:h the characters are generated can be constant, context-based, or custorn, or 
experience-based that depends upon previous selections. The activator event or key in one embodiment is occurrence 
of an apostrophe after a vowel in the Italian language, or alternatively an "e" after V or "u' or V after V in German. 
The activator event that is used can be a functkm of the language for whk;h system 100 is utilized according to letter 
and accent combinations that af^ar in the partkuilar language appRed. 

40 [0023] The present Invention automatically places the correct Italian apostrophe or accent on a vowel, based upon 
encounter or actuation of an activator event In particular, when the epostrophe is used as the activator event, there may 
be existent cases v^fhere context based process is uttftzed to determine whether an occurrence of an apostrophe Is 
word-retated or not, that Is Intended for another purpose, for example as an opening or closing single quotatton mark, 
in the event it Is detennined that an oocun-ence of an activator event is word-related, the English '"s* possessive is rec- 

4s ognized and accounted for where appropriate. Also, with certain types of actk>ns using an activator event where an 
epostrophe is entered as a recognized mistake, system 100 Is capable of deleting the entry, or deleting the entry and 
replacing it witii a space character, depending on Italian writing rules, for example, where the apostrophe can be used 
as part of a word, or between wonte rather than as an accent 

so Discussion of The Italian Language 

[0024] Compared to other languages, the relationship between Italian vwrting and pronunciation is quite easily 
specified by rules that provide relatively Intuitive spelling and easy pronunciation of new words. One exception where 
most errors occur Is related to the proper placing of accent and apostrophe signs In written text. Most Italian words end 
55 with a vowel. The pronunciation of Italian is such that the primary stress usually falls on the penultimate vowel, the sec- 
ond vowel counting from the end of the word, i.e., the syllable betore the last one. Accents are used to Indicate an 
exceptksn to this rule. In Italian dtotionartes and In some cases also to avoid ambiguities between words that have cfif- 
ferent meanings but differ only by the primary stress (ag., tOrtofne and turbine), accents are used on vowels inside a 
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word, In general writing, however, accents are used only on vowels at the end of a word, and indicate that the primary 
stress Is on the last syllable (e.g., perd). On some words (e.g., qui) the primary stress falls on the ending vowel, but no 
accent sign Is used, a frequent cause of errors whan writing as theiB Is no spedfic rule; one must fearn all the excep- 
tions. Italian words are sometinnes truncated (also referred to as elision), and In this case an apostrophe Is used at the 

5 end of the word to Indicate that a part of the word Is 'missing'. If the last character of the truncated word is a vowel, the 
primary stress usually defaults to that syllable, which Is marked by an apostrophe after it (not by an accent). In consld- 
emdon of the evolution of language, virritlng rules change over time to accept truncated words as new words, which usu* 
ally means tiiat they are not written with an apostrophe any more, but rather with an accent, or with no sign at ail. In 
practice, truncated words are sometimes so common that writers are not sure tf the word \b still considered truncated 

10 or not, leaving a doubt on whether an apostrophe should be used, or an accent, or no sign at alh For example^ the ItaTian 
word poco IS frequently truncated to po\ It Is a common mistake to write It as pd. Another word, piede, has a truncated 
fomn, originally written as pie \ now commonly accepted as p/d In spite of the fact that it is less frequent than po '. Similar 
ambiguities also affect truncated words which are written without any sign on or after the last character, such as quale, 
which becomes qual, and fnste, which becomes fra, whereas It Is a common mistake to wrfte qua! 'or tra\ In sotm 

IS cases an apostrophe is used if the tolk}wing word is of feminine gender but not If the word Is masculine (e.g., una altra 
becomes un 'aAra, but uno aitro becomes un attro). On certain other words, for example weekdays ending In V such 
as iunodi. It Is a common mistake to omit the final accent. 

{0025] A peculiarity of the writing of truncated words ending with an apostrophe is that if the last character of the 
truncated word Is a consonant, then the apostrophe also acts as a spacing character between that word and the follow- 
20 Ing one, I.e., no space character Is used between the two words. A text processing system In accordance with the 
present Invention removes an incorrect apostrophe sign takes this Into conskJeratlon In order to place an appropriate 
space character where necessary. 

[0026] Different diacritical signs are used for writing In Italian. In addition to the apostrophe, the grave accent (as In 
£) and the acute accent (as In ^ are used In everyday writing. The Italian National Standards Body (UNI) standard, 

25 UNI 6015-67 Compulsory Stress Mark In The Italian Language Orthography", first publiahed by the itaflan Nattonal 
Standards Body In 1 967, sets the rules by which grave and acute accents have to be placed on vowels in certain words. 
The circumflex accent (as in Q is also sometimes used, but like the use of grave and acute accents in the middle of a 
word It Is generally assoctatad to a more sophisticated and In part old-fashtoned writing style, whereas In modem Italian 
the trend is to let certain ambiguities be resolved l>y the context in v^cti the word appears, and not using grave and 

30 acute accents inside words (but only at the end), or circumflex accents. The normal Italian vin-iter is not expected to uee 
such a style other than in exceptional cases, which could Include the writing of French or Spanish woixis In an Italian 
context, but the invention described herein allows for Input and processing of such custom characters as well. 
[0027] lypfcally, even a skilled but non-professtonai writer of Italian does not know when to put a grave accent and 
when Instead to put the acute accent. In general, this Is not taught at Italian schools; Instead a single sign having the 

35 appearance of a smalt opening parenthesis rotated by 90 degrees counterclockwise (similar to the *1)reve" character 
used in decomal positions 728 and 774 of the Unicode character set, I.e. 'V) is used as a "simplified fit-all accent sign*. 
This sign, used exclusively In handwriting, Is not defined by UNI 6015-67, and does not exist in printed text or on ttallan 
keyboard layouts. 

[0028] The use of proper acute and grave accents is in general always found in print but Is in general only learned 

40 as part of specifte editorial, journaUstic and printing training and studies. The feet that the Italian school system focuses 
on handwriting but not printing, and that personal computers are increasingly giving non-professional writers the ability 
to put words In print, resuits in an Increasing degradation of the quality of printed words, whteh this invention alms to 
solve. The use of the apostrophe at the end of words, whrch histork^ally indicates a truncation of an originally longer 
word. Is In general taught at school, but remains a common source of mistakes in writing. Uke an accent, an apostrophe 

45 at the end of the word adds emphasis to the last vowel of the word. This same emphasis is clearly reflected In the stand- 
ard spoken language. This means that, on average, an Italian knows well when a word ends "either with a grave accent 
or with an acute accent or with an apostrophe", because this is how the word is spoken, but when writing the choice 
does not come intuitive. Certain words have a phonetic emphasis on the temiinal vowel, but no graphbal sign (accent 
or apostrophe) in the printed word (as In me and qui). This exception, wheret>y the printed form does not reflect the pho- 

50 netic stress, is another frequent source of mistakes, so that accents and apostrophe signs are sometimes placed where 
they should not. Like all languages, ttallan is in constant evolution. This means that there are cases and contexts, usu* 
ally determined by editorial policies, in whch certain words are written In a cBfferent way than for example the UNI spec- 
ification indicates. An example of this is the word p/d, which some prefer to write as pie' {as if it were a short form of 
piede, which historicalty it is). Other choices involve the use of accents, whereby for example some newspapers prefer 

55 to use acute accents In some cases where the UNI rules would require a c^ve accent, or vice versa. Another frequent 
eource of diversity Is the use of accents on capital letters. Some editorial styles prefer (often due to techntoal limitations) 
not to put accents on alKcapital words, putting instead an apostrophe at the end of the word bistead of a final accent, 
and simply remove accents Hi the middle of capital words, as is sometimes done In French. MICROSOFT WORD 97 
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Includes an option to allow for accented uppercase Jn French, but no epeclfic (^tlons for Itafmn. In these cases, where 
cffldat rules lack, or where these are d'tfferent than editorial choices, the most important rule tieoonrtes consistency, I.e.. 
not to use one time one style and another tima a di^erent style In the same context The Invention described herein can 
be applied and programmed to enforce consistency )n consideration of different preferences. 

5 [D029] Ever since the Introduction of typewriters, it has been a common convention In Italian to use the apostrophe 
sign aftera vowei in those cases In which the proper accented vowel is not available on the keyboard or In the character 
set being used Considering the needs of a very simple style of Italian writing (e.g., for perronat correspondsnce), at 
least 7 accented characters are needed (d, d, 6, ^, I, d and U). Anybody using ail-capital words or sentences (e.g.. In 
titles} will also need to use additional 5 capital accented letters {A, 6, /. d and U), bringing the total to 12. More 

10 demanding writers and contexts need additional 4 chsracters (f. /. 6 and d), for a total of 16 accented characters. 
[0030] The main other contexts in wf^ch apostrophe characters are used In Italian writing Is as quote characters (to 
delimit a text, before and after It, as in text), and after numbers (e.g., 5'2). In these cases, the apostrophe character 
Is sometimes used twice Instead of a double quote character, which Is in general more preferable (e.g., lext' instead 
of 'texf^. An automate text processing system must be able to recognize these cases, not only for example to convert 

75 the quotes Into the proper opening and closing characters (e.g., MICROSOFT WORD, which converts 'texf 'to Ifiexf' and 
'texf to *texf ), but also, in the particular case of Italian writing^ to detemnlne the GKely Intention of an apostrophe char- 
acter when there Is ambiguity (in very rare cases a word ntay exist both with and without apostrophe or accent) or insuf- 
ficient data (e.g., no matching entry In the dictionary of the software). In system 100 described herein, which can be 
applied 80 that pressing the apostrophe key once places the connect apostrophe or accent on a word (e.g., perche' 

so becomes percM), and pressing it again produces different variations (e.g.. percM, percM, percM, perche\ perchd\ 
etc.). the recognition of a context where an ending quote has to be expected (because an opening single quote 
appeared within a certain range before the currant position, as bi *pervh) allows system 100 to automaticaliy interpret 
the first input of the quote character as an acute accent {petJsh6), and the second Input of the same character as the 
cfosing quote fperche'), instead of an attempt to write 'percM (forcing an Incorrect accent to be written). SImllarty, if a 

2S word for which system 100 cannot apply any Italian rules (e.g., a completely unknown word which Is not even recog- 
nized as a likely foreign word) Is typed In a context where a single ending quote is to be expected, system 1 00 may be 
programmed to propose as a first output character a closing quote, rather than an accent (e.g., 'dedededo abababo' 
Instead of 'dedededo abababd). 

[0031 ] Apostrophes may also appear In Italian writing as part of a change of language context, which could be for 
30 a single word, or for longer parts of text. System 100 described here can be programmed to recognize aposb-ophes 
used In other languages, e.g., bi Gemtan and English genitives and ebbreviatiorts (as in Eva 's Apfel and eight o'clock), 
which have no match in Italian. While the fact ttiat German Is an official language In Italy and English is the most fre> 
quentiy-used second language is one consldeiBtlon, such a set of rules can improve the overall reliability of system 100 
so that ft produces ilttis or no Incorrect output even when processing (trying to apply Italian accent rules) long non*ltal- 
35 ian texts of any language based on l^in-wriiing. 

Variations of the apostrophe character 

[0032] Some computer keyboards reflect the foct that the 7-btt ASCII character set contains both an ''acute apos- 
40 trophe" and a "grave apostrophe* character (decimal codes 38 and 96. respecth^ely), and, accordingly, have keys to 
Input both characters. This is a common cause of inconsistencies when writing, since it Is desirable that In a text only 
one type of character be used to represent the apostrophe (but not for opening and closing single quotes, where, 
depending on the font being used, the two characters are appropriate to difforentlate between opening and closing sin- 
gle quotes). System 100 described herein can be programmed to convert, for example, the apostrophe character 
4S entered with the grave key to the 'acute apostrophe" character, leaving the character unchanged it is used as a single 
quote character. Some keyboards and character sets have an even wider range of characters and keys that can be 
used, deliberately or by mistake, for the same purpose. 

Discussion of Other Languages 

so 

[0033] Other languages have in part needs similar to Italian, but cannot always be atgortthmlcafly solved with the 
same accuracy. Gennan for example has upper and lower case versfons of "fi', "6" and "0", whrch are written as "lae". 
*oe" and "ue" when these characters are not available. The special character (lower case only] is expanded to "sa* 
when not available as well as always in upper case ("SS"). Different techniques have been proposed to automatically 
55 process Gennan text files to add or restore the missing Umlaut characters, but none with the reliability that system 100 
described here achieves for Italian and Its special characters. The Interactive mode of the present invention, where the 
user could for example enter "o" and then repeatedly press the "e* key to toggle from '6" to 'oe', could be of great help 
to combine automatic procedures with manual corrections during typing. 
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[0034] In Spanish vowels may have an acute accent, and the apostrophe character Is only used for quotes (It is not 
part of words, as In Italian). This would make It possible to use the apostrophe key after a vowel to enter the vowel with 
an acute accent A similar sequence could also be used for other Spanish chamcters that are varfatkrns of characters 
without dlacr1tk:al signs, such as t\r. Apostrophe characters that are part of quotes, or English possesslves, couki t>e 
5 recognized by the more generic procedures that are part of this system. Writing a vowel followed by an action indicating 
an accent is more Intuitive for the writer (as it is more similar to handwriting} than system 100 currently employed on 
Spanish personal computers, which requires that the user first enters a Mead lndk»tlng the desired accent, and 
then the vowel. 

[0035] French employs an even greater variety of characters, as it uses acute, grave, circumflex and dieresis signs 
10 on top of vowels, plus some other characters, like V- Because of this variety, whteh requires a lot of keys on a keyboard, 
the interactive use of this system couki be of great advantage on a system with a reduced number of keys, also possibly 
in combination with some language-oriented algortttvns (as for the other languages discussed here). 

Other Applications 

15 

[0036] The above examples for Italian, German, Spanish and French Illustrate how a certain text context foltowed 
by a certain input results in a certain algorlthmk^ally-modifled output, so that system ICX) can be optksnaily be modified 
in a dynamk: fashion, and controlled by the user. In an appik»tlon of system 1 00 where repeated input of a certain key 
Is provided, for example the dollar key, produces different, alternating cumency symbols, (e.g. euro, yen, pound, etc.) 

20 can be Implemented as a subset of system 1 00 descrbed herein. In one embodiment, system 100 intercepts repeated 
inputs of certain keys, and sends fake" backspace characters In the input stream, tbilowed by new characterB, to pro- 
vide the desired character combination output. For example, system 1 00 may be utilized in lieu of or in conjunction with 
a euro key, wherein the position of the euro key on computer keyboards Is as of yet undetermined, and is likely to 
change in the future, wherein a special euro key is provldGd, System 100 In accordance with the present Invention is 

2$ capable of Implementing a universal cun-ency key. 

Non-Lingulatic Factora 

[0037] LItlllzation of Italian letters with diacritical signs exceedis the limits of the character sets and keyboards orig- 
an inally designed for EngPish. The original ASCII and EBCDIC character sets, still in use today, support none of these 
characters. The present invention provkfes automatk^ conversion both from an accented ItaCan which requires support 
by a character set newer than ASCII, to an Italian using standard ASCII character set, and to restore the accented Ital- 
ian characters based on a 7-bit character set such as ASCII. The present invention effecth/ely eliminates accent-related 
Inconveniences caused by the use of 7-bit bottlenecks that are still common in the computing worid, especially in con- 
35 slderatlon of the increasing inten;x>nnectlon of different systems. 

Text Normalization 

[0038] When a user of an Internet search engine or dictionary tookup software enters a word or sentence, the 
_. 40 search key and the entries being searched should match. However, considering that for example calamft^, cafamit^ 
and ca/am/£a 'are three different ways In which, in practtee, the same word may be wrttten, while the word cafamita (v^h 
no sign) is a different word with a different n>eanlng, an advantage is provkied where both the search key and the text 
being searched ars nomnetized to a common format, using system 100 In accordance with the present Invention. Word 
format normalization Is provided using the same or simitar rules that are applied In real time where text is being typed, 
45 for exanr^le. The invention described herein normalizes text using the correct accent and apostrophe characters, or 
using only apostrophe keys (I.e. only 7'blt ASCII text), and, whatever the desired format. Is capable of converting Italian 
text from one fomnat to the other, without toss of infonnatton, and while maintaining a natural text readability, l.e., without 
introducing control codes which are perceived by the human reader as "artiftoiaf or unreadable. 

so Integration with Other Systems 

[0039] Operating system support for input methods provides processing of user input regardless of the target appli- 
cation. In one particular embodiment of the invention, one or more applk:ation-lndependent layers are provided t^y some 
operating systems, for example MICROSOFT WINDOWS, as well as by some applications, for example MICROSOFT 
55 WORD. An applk^ation-independent layer may consist, for example, of a set of functions dedtoated to error d^ectlon 
and correction. System 100 described herein Is capable of integrating with such a set of ern^r management functions. 
System 100 can also be directly integrated at the application level, for example in a word processor. In this case, text 
context Infbmiatlon, as well as input data, is directly accessible to the method employed t>y the invention, which can also 
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directly produce output in the format used by the eppliceUon itself. 

[0040} When Integration at the operating system level Is not possible, and Integration within the application Is either 
difficult or insufficient, techniques may be used to obtain text context Information, and to interoept user Input, and to then 
forward such processed Input to the operating system or to an application in real time. R>r example, text context Infor- 

5 matlon ts acquired by system 100 t^y monitoring the Iceyboard and display actlvtty. Such context Information may be 
used to apply different rules based on both the text context and the user Input The resulting output j$ then forwarded 
either to the operating system, or to the Iceyboard control eyetem, acting as if th»e user typed the data, or It is sent directly 
to the application. On architectures where It is possible to only detect, but not to remove the original Input stream, the 
text processing method may insert approprfate Icureor*, 'badcspace* and 'delete" characters, in addition to new text 

10 characters, to the Input stream In order to force applications to replace a series of Input events with a new series of proc- 
essed events generated by the text processing function. 

[0041] Several other cases of possible and useful Integration of this invention are known. In a particular embodi- 
ment, tact search procedures, as used within word processing and database applications, as well as on the Internet utt* 
lize system 100 both with the search string or with the text being scanned so that both are expressed in a standard and 

IS coned form, and so more efficient results are produced. 

{0042] Refening now to FIG. 3, afiow diagram of a method for processing text In accordance with the present Inven- 
tion wUi be discussed. Method 300 provides a finst step 310 for receiving character input. As characters are received 
and read, the characters may be written at step 3 1 2. For example. If the character "e** is receh^ed, the letter "e" may be 
written to a display device so that the character "e" may be viewed on the display by the user. A determination is made 

20 at step 314 whether a received character Is an accent indicator character. For example, the apostrophe chanstcter 0 
may be assigned as the accent indicator character. A detennlnatfon Is made at step 31 6 whether a vowel precedes the 
accent indicator effractor. In the event a vowel does not precede the accent indicator character, then the accent indi- 
cator character Is regarded as intended to represent its nominal meaning (e.g., an apostrophe), and method 300 con- 
tinues with step 31 0 by continuing to receive further character input In the event that a vowel does precede the accent 

25 Indicator character, then nrwthod 300 interprets the vowel and accent indicator chacacter combination to represent the 
desire to utilize en accented vowel. In this case, the character set for that vowel that includes a range of accented vowel 
characters Is fetched at step 318. Method 300 then deletes baclcwards two characters at step 320 to delete the vowel 
and accent character tandem. The next character from the character set is read at step 322 whereby the next vowel 
character is written In place of the previously written vowel and accent indicator character tandem. When step 322 is 

30 initially executed, the next character written Is the first character in the vowel character set. The next chamcter input is 
then read at step 324, and a detemrtlnation Is made at step 326 whether the next character Input is also the accent indi- 
cator character. The accent indicator character may be repeatedly input (e.g., the user repeatedly hits the accent indi- 
cator character on the keyboard one or more times In succession). By repeatedly Inputting the accent character 
indicator, the user Is able to scroll through the vowel character set until the correct accented vowel character is written. 

35 in the event the next character is the indicator character, for each input of the indicator character, one character back Is 
deleted at step 326, thereby deleting the previously written vowel character, and the next characterf rem the vowel char- 
acter set Is written at step 322. and the next character Input is read at step 324. This loop m^ continue until the desired 
vowel character, with conrect accent and correct punctuation, Is written. When the vowel character set Is fetched at step 
318, the vowel character s^ nnay be written in a dncular buffer so that when the end of the vowel character set Is 

40 reached, the vowel character set may be read again from the beginning at the first vowel character upon successive 
Input of the vowel Indicator character. In the event the next character input is not the vowel Indicator character, for exam- 
ple, a space character, the next character input Is written at step 330, and method 300 may continue at step 322. 

Accented Cliaracter Sets 

45 

[0043] The user may be provided with one or more available accented character sets depending upon the level of 
writing desired. For example, at least one or nrrare of the following accented character sets may be available: 

Seil\[<f> □ 8 ^ * C 1] 

so 

l» ^ A] 

55 

Set 1 may be described as comprising characters: lower case V with grave accent, lower case "e" with acute accent, 
lower case 'e* with grave accent, capital ''e' with gave accent, lower case T with grave accent, lowercase "o" with 
grave accent, and lower case "u" with grave accent. Set 2 may be described as comprising five characters: capital "a* 
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wtth grave accent, capital "e" wfth acute accent, caphal 'e" with yave accent, capital "c" with grave accent, end capital 
'u* with grave accent Set 3 may be descrit>ed as having four characters: lower case T with drcumnex accent, capital 
T with cbcumtlax accent, lower case V with acute accent, and capital V with acute accent. 
[0044] Character sets may be selectively available depending upon the needs of the writer and the level of formality 

5 required. For example, only Set 1 may be available for very simple Italian writing style such as for personal con^pond- 
ence. For writens requiring accented capital letters, for example when writing titles. Set 1 and Set 2 may both be avail- 
able to the user or system. For more demanding wrtting. Set 1, Set 2, and Sef 5 may be available, for example, when 
a higher level of formality Is desired. One or more vowel character sets per vowel nr>ay be created based upon the avail- 
able character sets. The created vowel character sets may also include essentteil accented vowels with punctuation as 

70 necessary so as to be able to discriminate between accented vowels, with and without punctuation, and non-accented 
vowels, with and without punctuation. For example, the following vowel character sets nrtay be created if only Set 1 were 
avaDable: 

&/a: fa d> <fi^ aon] 

ts «... J 

Seie: (e I □ Bo e«a] 
SeiE: [E £^ |Aog] 
^ Seti: [<D ^ <Do] 

etc. 

If Set 1, Set 2, and Set S were all available, Seta and Set e are unchanged, but Set E and Set i, for example, are 
25 enlarged to Include the additional available characters. Thus, new and/or enlarged vowel character sets may be created 
as ajf^ropriate: 

SetE. pE A C^t^ J- ^flo y^n] 
30 Seti: [<D 4> *a ^ CDo] 

Setl\ [I » «»(ci a Bo 

etc, 

35 

The vowel character sets may t>e ordered or sorted according to the frequency ot occun-ence in the language of interest 
such as Italian so that the most frequently oocuning character will be selected first, the second nK>st frequently occur- 
ring character will be selected second, and so on, In order to maximize the efficiency of selecting the desired accented 
or non-accented, punctuated or non-punctuated vowel. 

4o [0045] Refening now to FIG. 4. a How diagram of a method for processing text In accordance with the present inven- 
tion will be discussed. Method 400 is executed by system 1 00. and In one particular embodiment, by computer system 
200. Method 400 Is utf Dzed to create and modify the character sets used by system 1 00 for a given language. A desired 
language is selected at step 410. One or more character sets for the selected language are fetched at step 412. One 
or nnore of the fetched character sets are selected at step 41 6 according to the style and formality of language to be 

45 utilized. A punctuation lave) Is selected at step 418. The next character set to be used is read at step 420, and the next 
character is read at step 422. If there is preexisting a previous vowel or character set. such a determination is made at 
step 424. If there is no previously existing vowel or character set, a vowel or character set Is created at step 428, and 
the character Is added to the vowel or character set at step 430. If there Is a previously existing vowel or character set. 
a determination is made at step 426 whether the character is a new character, and rf so. it is added to the previously 

so existing character set. Otherwise, the method continues at step 432. If an end of the vowel or character set Is not 
reached as determined at step 432, then additional vowel or character sets having additional vowels or characters are 
created. A detennlnation is made at step 434 whether the chamcter sets are completed, and if not. method 400 contin- 
ues execution at step 420. If the character sets are completed, vowel and character sets are saved In system 100 at 
step 436, and method 400 ends at step 438. 

55 [0046] Refening nowto FIG. 5. afiow diagram of a method for processing text In accordance wth the present inven- 
tion will be discussed. Method 500 begins with the selection of automatic mode at step 510. Text input data is received 
at step 51 2, such as from keyboard 11 2. file 11 4. microphone 1 1 6 via speech-to-text engine 116, graphical Image fi le 
120 via OCR 122. eta The text Input data is read for a predetermined chamcter sequences at step 514. For example. 
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tJia ocxnjrrer>ca of an activator evarrt such as an apostrophe followed that Is preceded by a vowel is read and detected. 
A detemnlnatlon Is made at step 516 whether a rule for the read sequence Is found, for example is a rules list. If a rule 
for the read sequence is found, the word Is corrected at sep &28, for example by renrusving the text sequence and replac- 
ing ft with a conrected sequenca For example, a vowel followed by an apostrophe Is replaced wfth an accented vowel 

5 character according to the rule for the Input character sequence. If the rule for the Input character sequence Is not 
found, a wordlist Is searched at step 516 for a correctly accented version of the Input word according to the read text 
Input data. If the word Is found, the word Is corrected at step 529 wherein In the wordlist replaces the word In the text 
input sequence. Otherwise, a vowel fist is 8ean::hed at step 522 for a list of possible accented vowel sequences accord- 
ing to the read input sequence* If a vowel sequence Is found, the word is corrected according to the vowel sequence at 

10 step 62B. Othenvise, a fallback rule is read at step 528, and the wort is con-eded according to the fallback mie. 

[0047] Refening now to FIG, 6, a method for processing text in accordance with the present Invention will be dis- 
cussed. Method 600 begins with the reception of a word to be processed at step 610. The word Is nonrtalized at step 
612, for example any accented characteis ara Ignored as far as the present accenting Is concerned so that the 
accented character is treated as its root letter character, and optionally as the root letter followed by en activator event. 

IS Attemallvely, extended character set representations of words, such as S-bn ASCII, are nomnallzed by being converted 
to 7-4)it ASCII character set words. The ending vowel, if any, Is determined at step 614, a corresponding vowel list Is 
fetched at step 616. The word is compared to the vowel Pist at step 618, and a determination Is made at step 620 
whether a match In the vowel list Is found. This process continues until a match Is found, at whbh time the accent or 
punctuation infonration is returned. A detenmlnation is made at step 624 whether to apply the returned accent or punc- 

30 tuation information, and if so, the word is modified aocordlngiy at step 626. In the event it is determined not to apply the 
returned accent or punctuation infonmatlon to the word, the word is left unaltered at step 628, and method 600 continues 
at step 630 for additional words. 

Ovenldlng Automatic Action 

25 

[0048] System 1 00 is capable of overriding or correcting an automatic actk>n generated by system 1 00. and is fur- 
ther capable of remembering the ovemde event to be applied in future events. Possible variatbns and extensions, 
implemented (n software, range from a way to use the Intercepted Ir^ut to take quk:k notes and then paste them, to a 
way to generate different currency symbols using a single currency key or synnt>oL System 100 p^ces an Information 
so box near the current cursor position, with notes about correction that was made or future results of repeated presses. 
The following are examples of different actions that can be performed by system 100 descnl^d herein. 



User Input 


System Output 


Note 








e' 


b 


Changed to grove e 




k 


Changed to grave e 


perche* 


perchd 


Changed to acute 6 


perche 


perch6 


Changed to acute e 


po^ 


po' 


Unchanged (comect) 


pd 


po' 


Changed to apostrophe 


po* 


po' 


Consistent apostrophe style appfied 


quii 


qua 


Accent removed 


qua' 


qua 


Apostrophe removed 


quat' 


qual 


Apostrophe converted to space 



[0049] The above examples reflect rules that are built into the algorithms for Italian which are part of system 1 00 in 
55 an Italian based embodiment. These rules ensure that system 100 exhibits a re&at>ility exceeding 99% without even 
requiring an exhaustive dknionary of words. 

[0050] The follomng examples show the effect on a more dynamk: situation, where the user repeatedly presses a 
certain key to Intentionally achieve certain results (even oveniding Italian rules, in an Italian context). 
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User input 


System Output 


Note 


5 










po" 




Second press of starts loop 




ft' 




First apostrophe » second = 4 


7P 




0 


Third press = 6 




ft 


A' 
ft 


Fourth press 




ft"*" 


ft 


Fifth press » e* 


IS 


ft"** 




Sixth press restarts from 6 


ft 




No change 




fte 


6 


Same loop as that activated by (d Is 
easier to reach than 6) 


20 


$ 


$ 


No change 




$$ 


□ 


Second press e euro sign 




$$$ 




Third press = yen sign 




$$$$ 


$ 


Back to step 1 


25 


Oft 


oe 


No change 




oee 


5 


Second press of "e* starts loop 




Oftftft 


oe 


Third press toggles bacic to beginning 


30 


Oft 


6 


Variant of the above 




Oftft 


06 


Loop iiKB aoove, dui in QiTiBrent order 




6 


6 


No change 


3S 


66 


oe 


Same loop as that activated by *'e" (less 
practical, though) 




ss 


ss 


No change 




sss 


B 


Triple s begins German sbarp-s/ss loop 




S6S8 


SS 


Beck to step 1 



[0051] The above are some examples of what system 100 executes In a •dynamte* mode. The dynamic mode 
causes different characters to be displayed one after the other whfch, when system 1 00 Is Implemented as a keyboard 
hook, is achieved for example by sending fake character and backspace inputs to system 100, and can be applied In 

45 different ways based on different activator event keys, different output sequences, which can be static or dynamic, e.g. 
'ieamlng** from past selections, and also in combination with Italian rules to generate the most likely desired output first. 
Variations include, for example, causing an above ''ss" cycle to break the loop after the third V Is pressed so that 
repeated actuatioris result in the same amount of "s" characters, or the insertion of Just one more step to generate three 
V characters in a row, but then restart from "ss", "B", etc. The loops can be closed, beginning again from a certain step, 

50 which is not necessarily the first step, such as with a circular buffer, or open wherein after a predetermined number of 
identical inputs, the output becomes Identical with the input, and consist of any number of entries. 
[0052] A learning mode is provided by system 100 where the most frequently used character or cun-ency symbols 
could always be output first in the loop, depending upon the statistically most encountered selection. The examples 
iilustratat that system 100 Is capable of utilizing more than one activator event, for example, apostrophe, accented char- 

55 acter, specifk; currency character, generic currency character, to access a given character loop or set. Thus, with some 
keys and characters, such as apostrophe, are utilized genertcariy and combined with a prevk>us charecter. while other 
keys and characters, such as accented characters, serving both as a reference to a base character and also as an acti- 
vator event. 
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[0053] The order according to which the fteme in these dynamic loopa occur are defined as static (prog^- 
defined), static (deffned by user In progranfi settings), dynamic (frequency-based), dynamte (frequency-based, with 
optional adjustable limit before changes occur), and dynamic (combination of the previous ones with Italian language 
rules which to calculate the most likely cases). For example, system 100 could be programmed In a manner such that 
5 the order in which the items appear is changed only after two (or one, or three, or more, etc.) consecutive selections of 
an item which is not already at the first place in the list, or it could be set such that the order changes after the total 
number of selections In a given timeframe, svhlch can be unDmlted, I.e., not time related at alt, and is such that a certain 
item is desired over another one at least 5%, or 1 0% more times, etc. 

[0054] The first press of an apostrophe automatically outputs a word or character having the connect accent in 

10 accordance with the rules descrit)ed herein without further user Intervention. Suljsequent consecutive encounter or 
actuations of the activator event, the apostrophe, activate a dynamic manual selection mode, Other peculiarities of Ital- 
ian are also considered. For example, instead of, or In addition to, an apostrophe key or character as a desired way to 
initiate the automatic placement of the sign which system 1 00 determines to be the best, since in Italian a vowel never 
appears twice, each vowel itself could act as an activator event which, encountered or pressed more than or>cs. Initiates 

15 a character selection loop. System 1 00 nrny be progranwned to Implement this and other simlfarty based modes. 
[0055] A rute-based approach Is also possible for languages such as German which has the special characters 
•a",*6", "Cr, 'B", "A*, '6", "0*, where the context can tecllitate system 100 to determine whettier "oe* Is more likely to 
mean "oe" than "6*, for example, and propose that as a first choice when the user writes 'oe', and the otiier when the 
user presses "e' again ('oee' = "A*). As with an Italian embodlnrienti repeated Identical characters could be used Instead 

20 of the vowel + "o" combination. 

[0O56] Additbnal ways for the user to specify a certain accent may be inr^lemented by system 100. For example, 
the user could use the ch&rHders T or V, or both combinations thereof to indk:ate a circumflex accent, before the apos- 
trophe to quk^kly spedfy the accent For example, •aV* would mean 'd", and la/* would mean "A", and •aVT or "aA*" 
would mean "fi". Another variant is the placement of the synrtbols before the vowel, as In Vi* etc. 

25 [0057] System 1 00 provMes differertt ways for the user to override system 1 00, and Just enter exactly what Is typed. 
System 100 can for example use the Num lock key for tills purpose. On one hand, system 100 ensures tfiat Num Lock 
is always switched on if the user desires, and on the other It then interprets any Num Lock actions as on/off commands 
for Its text processing system as described here. The Num Lock is a key that effectively has llnie practical use, so this 
action provides two benefits in one. Scroll Lock or other keys could also be used in a similar manner. For temporary 

30 on/off, it is possible to hold down certain keys while entering text that would othervifise be modified. The user can go 
back after an automatic correction, and rewrite the text so that it is not modified a second time. 
JOQSB] Examples of additional program optrans Indude a setting to make sure that accent changes In the middle of 
a word, as opposed to changes at the end of a word, are applied only while typing, and not on file operations. This would 
be in the assumption that accents placed in the middle of a word, for example which typcal Italian never uses, have 

3S been placed with proper Icnowledge. More options provkled by system 100 include the possibility to scan a text file for 
patent character set emors, whk;h might for example have lead to the word '^erchd* to become 'perchX:' or V^rchf. 

Input and Output Interfaces 

40 [0059] Depending on the hardware and software with which system 100 of the present Invention is utilized, exam- 
ples of sources from which Input data can i;>e acquired include the operating system, an input nr^thod system Interfece, 
an error-handling interface, an accessibility interface, e.g., as used to handle input, output and context for blind users, 
or an application such as piece of software, or the keyboard system or other hardware, or display memory, or computer 
memory. Text context data is acquired either from the operating system, or from an input method system interface, or 

45 from an en^r-handling interfiace, or from an accessibility interface, or from an appDcation. or from display memory, or 
from computer memory, or by buffering the Input data. Output is sent to the operating system, or to an input method 
system Interface, or to an error-handling interface, or to an accessibility interface, or to an application, or to the part nor- 
mally receiving data from the keyboard system. If the input stream cannot be Intercepted for exclusive use, then output 
Is generated in a way as to produce the deletion and replacement of the parts that require modifteation, for example by 

so Inserting ^bursor, "backspace" and "delete" control commands as appropriate. 

input and Context 

[0060] System 100 recognizes certain input events as causing a disnjptlon of context, requiring the collection of 
55 new context Information. For example, when the user moves the cursor with the mouse, or moves the cursor up or 
down, or selects an apprication command via the mouse or keyboard, system 100 takes steps to try to reconstruct the 
new text context, i.e., the text suaoundlng at least preceding the new cursor position. Modem operating systems, such 
as MICROSOFT WINDOWS, provide dedicated interfaces for this purpose, designed to gh/e text context data for aoces- 
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elbiltty purposes, e.g.. to read out the current text context to a blind user, or as part of an Input method systenn which 
system 1 00 described here embodies for languages such as Iteiian. context collection through this type of system 
calls is not possible, ft may still be possible to obtain equivafent information directly from the application being cun^ntly 
used. For example^ applications such as MICROSOFT WORD provide euch infonmation. On aystems where neither the 

5 application nor the operating system provides such Intomiationp It is always poss&Ie to buflerthe input data as It Is being 
typed, and resort to that infomiatlon as tha context data However, when the text context is lost, for example after a ver- 
tical cursor movement, h is desirable to utilize dfferent techniques to collect text context information, at least for the part 
immediately preceding the new cursor position. On-screen optical character recognition (OCR) is one such option. Sys- 
tem 1 00 employing on-screen OCR recognizes the cursor because It Is the only object on screen that flashes. Atterna- 

io tiveiy, system 100 queries the operating system, and then analyzes the surrounding screen bitmap for text patterns. 
Other techniques are also utilized For exannple, with certain operating systems and appRcations. It Is possible to 
directly access the region of rriemory that provides the necessary text context Information. Where no context ie avalla- 
bte, system 1 GO utilizes generic, not context-specific, likelihood rules. 

15 Context Information 

[0061] Context Infonnatlon provides the following data: the last, cun-ent, word up to the current insertion or Input 
point, Infonn&tion on whether the context before that word required a capital initial, that is an upper case character. 
Information on whether the context before the current insertion or input point Includes a single opening quote with no 

20 coiresponding closing quote, and text language for the block Including the last word. If specfRed by the user or otherwise 
known. For one embodiment of system 100 In accordance with the present invention, for example on a akwer system, 
word context information alone, such as the last word, could be sufficient, depending on the requirements concerning 
execution speed, memory use and output quality Word context Is an important piece of context. Even partial word con- 
text is useful, for example when the language mles include suffix rules. 

25 [0062] The special handling of a single quote character Is utilized for Italian and other languages where one embod- 
iment utilizes the apostrophe character, or possibly any character resembling It, such as tifie lacute* character, as an 
activator event Defined as variables, In a possble implementation the context variables oouki be: 

context.buffer s context string and data 
30 context.word = string 

contextcapital = yes/no 
context.expectslnglequote = yes/no (or counter) 
contextJariguage = language code 
contexttypemode = Insert^overstrlke 

35 

[0063] These variables are provided by the host environment, such as the input method Interfeu^, word processing 
applioatiorv etc., or are calculated by system 1 DO. 

[0064] in the context of an interactive use, for example system 1 00 applied while the user Is typing, context.buffer 
is an optional copy of the local text region, which is dynamically maintained by system 100 while the user is entering 

40 text. The purpose of this data Is to be able to provide Information about the cun'ent word, l.e., to construct context. word, 
when system 1 00 is applied to an environment where the applbatlon in use, or the operating system, are unable to pro- 
vide text context infomiation. This data consists of a string of characters tiiat represents a "sliding window" region of ttie 
text cun-ently being typed, plus status variables that Indteate the cursor position with respect to the buffer, end the size 
of the buffer. When the user types characters of text, these are appended to the string In context.bufrer until a maximum 

45 size has been reached, after which new characters are added, and okf characters are discarded from the buffer, as nec- 
essary to maintain the maximum buffer size where one has been set Depending on tiie inrplementation, characters 
may be discarded at the beginning of the text buffer, but not at the current word, if the cursor is at the begmnFng of ttie 
buffered data, or at the farthest point fmm the cursor position, or using other preferences. When the user uses the cur- 
sor left/right keys to rr^e the cursor in the application currently in use. the cursor position In the local buffer is also 

50 updated accordingly. During cursor left/right events, the cursor position may tempomrily fall outside the cun^ window 
of buffered characters without requiring tiie buffer Itself to be reset, but if text Is then added outside the buffered region, 
then the buffer contents may be reset, as the contents of neighboring regions of text is unknown. The context buffer data 
also is reset when the user uses the cursor up/down keys, or when the mouse Is used to move the cursor, or to execute 
commands, or when certain combinations of keys are pressed to execute a command, if system 100 cannot determine 

55 how these events affect the text cun-ently being written. 

[0065] A word as stored in context word is deflned as a sequence of characters building a single word, such as It 
would appear in a dictionary. This includes, without being limited to, letters, digits, and ttie dash sign. An apostrophe 
sign before a word is not considered part of the word for the purposes of one embodiment of system 100. An apostro- 
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phe eign, or other sequence of one or more non-word eigne after the fast letter of a woni l8 processed as a posstbie 
activator event sequence when system 1 00 is appDed to Italian and certain other languages, rather than being accepted 
without action as part of that word, if the word context cannot be determined, either interfacing with the application, or 
Interfedng with the operating system, or through the local context buffer contextbuffen then the word context string Is 
left empty. This may for example occur )f system 100 Is poorly Integrated with the host envlronmenft, so that context Infor* 
mation can only be acquired through buffering of the input characters, and is tost after vertical cursor movement A 
buffer holding more context text than the cunBnt word Is both deslret>l8 and useful, as It avoids having to request context 
Information, which may also be unavailable, to the operating system or application after horizontal cursor movement 
and text deletions going backwards beyond than the current word. A posstole Implementation Is drcular text buffer of 
constant length, fmm which the current context word Is derived as necessary. 

[0066] The variable context.capKal is set to yes If, according to the punctuation or other context attributes (e.g., 
beginning of sentence), die word stored In context.word would need to be capitalized. This Infomiatlon is not used for 
capitalization purposes, but rather because some accent rules need to fcnow If an unknown word Is llksiy to be a proper 
noun or not, and proper nouns can be recognized by the capital initial, but only if such capital initial is not cont^-spe- 
cific. The variable contextexpectslnglequote would be set to yes if it was determined from Che context that, within a cer- 
tain range, the maximum of which can be specified, e.g,, as one or two sentences fiom the current position, or as a 
certain number of characters or words, going back from the current position, certain characters were found whbh are 
nonmaily used as an opening sequence for certain types of quotes for which the ck>sing sequence may consist of one 
or two consecutive apostrophe or grave chamcters, but no such dosing sequence was found. Knowledge of this Is val- 
uable because If an apostrophe is for example found Immediately after en unknown word, especially where It Is Intended 
to produce good results even with unknown words. It might Indicate an accent or instead a ckssing quote. Certain mies 
for pladng accents where automatic oon-ectlon Is desired and no user preference is given leave an apostrophe or grave 
character unchanged after an unknown word, If a dosing single quote Is expected. A single opening quote is in general 
defined as an apostrophe sign (decimal ASCII code 39, or similar characters), a grave character (decimal ASCII code 
. 96. or similar characters), or a comma (decimal ASCII code 44, or similar characters) invnedlately preceding a word. 
An additional optional oonditbn to recognize such an opening sequence Is where the sign does not immediately follow 
a letter, or where It immediately follows a space, line feed, or i^pears at the beginning of a sentence, or whero it imn^ 
diateiy precedes a letter. The sign may also appear twice, i.e., consecutrvety, or for example In single-quotes within dou- 
ble-single-quotes within double-quotes, In which case it is expected that the contextexpectslnglequote condition be not 
cleared until all quotes are matched. For this purpose, a counter fieki to this Information is assodated. The con- 
text.expectsinglequote condition is cleared after a certain number of characters, words or sentences, in order to avoid 
the carrying over of possible Interpretatton errors. Furthermore, the single quote counter Is not be increased or 
decreased for single quotes that are recognized as having a specific purpose that does not require paired sets of 
quotes. This applies, for example, to single quotes appearing as part of a quoted possessive or negative or known 
et)brevlatlon form, as in 'Will said t)on't ptay with Mery*^ ball beiore 5 o'clod^, or I'll be very angry.* and went to work " 
Known patterns such as "* Y. ""'s", "o'*", ■*'IP C*" denoting any word-string) could be part of a list used to exdude certain 
single quotes from the count of opening and ck>slng quotes. 

[0067] The variable contextlanguage Indbates the language of the current context. This Is used by certain accent 
rules because. If a word requiring accent action Is found that is unknown, then no action should be taken if the word is 
known to be not Italian or another language for which this system can i>e applied. As an example, if the Implementation 
is based on a set of suffix rules, with optional dictionary words, usually providing exceptions to rules, and where a word 
does not match any dk:tlonary entry, system 1 00 applies mIes. and If no rule Is found, a falback rule, for example, a rule 
saying that if the last letter of the word is "a", then an activator event after the last vowel would mean that the V should 
be converted to an "d*, would be applied, if the host environment such as the operating system, or application, provides 
no language infomnation. a method is used to kJentIfy Italian text by comparing all bigrams (letter pa^) In the current 
word with a table of bigrams used In ItaUan. This technique occupies about 1 00 bytes of memory to store bigram data 
for all possible pairs, is fast, and for Italian provides reliable results because Italian uses only a small pert of the possible 
two-letter con^inations, about only one third of all possible conrtbinations. The table of bigmms is stored so that each 
possible bigram is represented by one bit, which is set to 0 on to Indicate that that bigram is used in Italian, or other 
language to whk^h this Invention is applied, or is not A bigram language analysis Is feist and improves the reliability of 
accent rules on slower systems where a word-based analysis using a whole dctlonary of stored words or other tech- 
niques might use too much memory and execution lime. 

[0066] The variable context.typemode Indicates whether, during interactive text Input mode, text is being Inserted, 
i.e., text to the right of the current cursor position moves to the right as new text Is entered, or overwritten, i.e., new char- 
acters replace existing characters. This information is used both to appropriately update the local context buffer, and 
when sending fake input characters to replace one string with a new one. For example, in Insert mode, to replace a 
character witti another one the user, or a system simulating user input, presses the Backspace key followed t)y the new 
character. In overstrike mode, however, the user uses the Cursor Left key Instead of Backspace, or otherwise one char- 
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acter of unrelated text tollowing the replacement point is tost. Altematlvety, the user, or system 100 Emulating user 
Input, temporarily changes the lypeMode as appropriate before text Input, and then restores the original status. The 
IVpeMode istypclalty changed with an appropriate application or system command. Under MICROSOFT WINDOWS 
and other operating systems running on PCs the lypeMode Is changed using a dedicated keyboard key, named Inseit 

5 Use of this Key is Inteicepted or simulated as necessary. Appllcattons typically InWate in Insert mode, with a few excep- 
tions starting In overstrike nnode, such as the MICROSOFT WINCXWVS Conwnand Prompt window, which are known, 
ancVor which the user may want to program with appropriate settings, and system 100 described hare must keep tmck 
of all actions which affect the lypeMode status. A few applk:ation8 use the Insert key for other purposes, for example 
MICROSOFT WORD can use it to Insert clipboard text, but these same applbatlons usually provide an Interface with 

10 TypeMode status Infomiation. On other systems and aji^Hcations, equNalent keys and commands are detected and 
simulated as necessary. 

Acthrator Event Sequences 

IS [0069] An appropriate acthrator event character tor Italian Is the apostrophe. The function of system 1 00 described 
herein lies In the algorithms employed to ensure that the apostrophe character is properly converted to an accent, or 
left as is, or recognized as an error and therefore totally removed from the Input stream. The acute character, the char- 
acter under the tilde character on US keyboards, msy be utilized Instead of the apostmpha, or to expJk:ltly set one t^ 
of mark instead of the defautt one placed by the apostrophe sign. 

20 [0070] For one embodiment of the Inventton appHed to ItaBan, accented vowel characters that are presem in the 
input stream are processed as if they were two separate characters, namely, tfie vowel character plus the activator 
event expressing an Intention to select an appropriate chamcter, different from an exact character. This is equivalent to 
an occun'ence of the vowel character followed by an apostrophe character, with the exception that the considerations 
dealing with possible single quote character arTt)lgu[t1es, for example a dosing quote character, do not need to be 

25 applied. It Is effective to treat accented vowels appearing In the text stream using an Interactive mode, for example when 
the user Is typing, and only If such accented vowels appear at the end of words, and optionally unless they are not pre- 
ceded by an apostrophe, when working on file or clipboard data. Accented vowels appearing In the middle of words are 
usually not Italian, and are written by more sophtedcated writers who Intentionally utilize such characters, and an Inter* 
active mode provides additional control to correct or change the proposed accent or apostrophe. In other words. In an 

30 interactive mode it Is fine to take action one letter after the other, as they are typed, and the concept of inskfe a word 
does not exist, because during normal typing of a word letters ere always at the end of the partial word. System 100 
described herein provkies an Intuitive way of looping from one cheuacter to the other. On file and clipboard data, system 
1 00 determines when something occurs at a true end of a word, and there Is no option for user interaction, ao changes 
in the middle of a word are normally not applied, unless spedfk: user settings or dk^onary entries require such a 

35 change, or at least not based on generic suffix rules alone. Thus, in one embodiment of the Inventton, accented char- 
acters appearing at the end of a word but before an apostrophe are left unchanged. 

[0071] Certain characters (V, T, "I", apostrophe, acute, etc.) are optionally utilized by system 100 to explicitly 
express what type of accent or apostrophe fs placed. This provides one way to handle exceptions. It may not be utilized 
by an average writer^ and in an alternative embodiment the dk^tlonary is extended, rather than to using such a method 
40 vvhen typing. This method Is useful to handle exceptions when encoding accented text as 7-btt ASCII, for future recon- 
version. 

[0072] Repeatedly pressing an activator event during text input toggles the state of different diacritical signs, such 
as acute, grave, drcimTflex. apostrophe, umlaut, no sign, etc. This set of signs, as well as the desired order, Is based 
on language, user settings and optionally on dynamically adapted based on the frequency of previous selections. In 

45 other languages (non-Italian), or tor language-neutral applications, such as for entering cunrency or other symbols that 
are not present on a keyboard, this embodiment Is used In combination wHh certain predetermined sequences of char- 
acters that nomially do not occur in normal text For example, cunrency synribols are usually never used more than once 
in a row, so the repeated pressing of a currency character, e.g., could be recognized and processed as an activator 
event t>y system 100, initiating a certain action. In many languages, action is initiated after ascertain character Is 

50 pressed two or three times, or when this Is done one. two or three times after a certain context, for example, in German, 
repeating V after an existing '^e", "oe" or "be" initiates a loop toggling between the two letter pair and the first letter 
with an umlaut. 

Automatic Changes and User Re-corrections or Further fUIodlflcatlons 

55 

[0073] When system 100 is utilized In an interactive mode, for example during typing, a loop provkling multiple 
options is Initiated it following an automatic change. Also \oop is initiated to more simply override the automata change 
and to manually enter some text The user additionally has other manual ways to input sequences which would normally 
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be changed by system 1 00. such as certain combinations characters which might otherwise Initiate an IME loop that 
does not Include those combinations of characters. This Includes usbtg traditional text editing sequences to Input one 
character at a lime, separated by a space charactar, and then removing the apace character. Even after very short use 
of this system the user becomes tamlOar and comfortable with the tact that certain actions result in automatic changes 
5 and loops, end because these automatic actions are very predktable, the ways to avoid them, if necessary, also conne 
very natural to the writer. 

Rules and Data Struoturea used for Italian Text Input 

10 (0074] When system 100 Is utilized with Italian, In which case the considerations using apostrophe and accent 
characters apply, reliability is provided when automatically produdng a conrect output on a first try. without requiring 
additional userfeadbadc or efforts, ^stem 100 automaticefly places an appropriate accent or apostrophe mark at the 
end of words, which in general means on or after a vowel For one particular embodiment, It is sufficient to utilize a set 
of word suffix pattern matching rules, with appropriate priorities, and defeult fallback cases. Specific words may be 

15 Included in the rules, mainly to define rules and exceptions that are associated to certain exact words, rather than to 
groups of words ending with a certain suffix. 

10075] in certain cases a word is written in a particular way rather than being k)aeed upon an assumption for a 
sequence of suffix mles, for exanple to produce a positive match on a foreign word, or one that had not been consid- 
ered as an exception. Such as case Is, for example, when the user is Writing Inside a singi&<|uote context In such a 

20 case, system 1 00 considers that en apostrophe character after a word which in the dictionary expDcitiy appears is writ- 
ten with a certain accent or apostrophe is not a dosing quote, but rather should t>e transformed into an accent whereas 
If instead the word does not produce an exact match, but rather only satisfies the suffix rules, then system 1 00 displays 
an infornnation box and optionally produces an audio cue. while producing a default output The default output In such 
uncertain conditions, for example where there could be a dosing single quote, or an accented unknown word, is based 

2s on stertlsncal conslctoratlons about the likelihood of a closing quote at a certain distance, for example measured in char- 
acters, or words, from an opening quote, as opposed to the statistical liloellhood of an accent sign on an unknown given 
word. This Information, together with accents on words that match suffix rules, but not exact word entries, are collected 
and remembered by system 1 00 by t>elng stored, so that user chokies progressively converge so that system 100 pro- 
duces better results on a first try. 

30 [0076] In an alternative embodiment an exhaustive dictionary of words, in addition to suffix rules, In an editorial 
context is provided. A professtonal publishing house might have a policy to check every single word. In such an embod- 
iment, even when the suffix rules produce the correct output, a warning is issued Informing the user that a word is 
unknown, as is done for unknown words in general, based on a traditional enror detection approach. System described 
is applied independently from traditional spelling checkers and smniiar technologies, in comparison, requires less man- 
as ubJ intervention, and is more reliable. 

[0077] The data structure presented herein can be used for ail of these purposes, Integrating a varying number of 
suffix rules and exact word entries, based on accuracy, speed and memory overhead prtorities. Even where system 100 
utilizes only a few dozen suffix rules and exception words, a first hit rellabf lity exceeding 99% for the average Italian writ* 
ing needs is provided in one particular embodiment 

^0 [0078] Tlie context language status variable Is used to detemilne whstlier to apply Italian rufes to the text, or not 
This partk^ular embodiment of system 100 employs two edditional techniques to prevent possible errors. First Is a 
bigram table is used so that Italian suffix rules are not applied to words that contain one or more blgrams that do not 
nomnaliy occur in Italian words. Second, a list of certain word patterns is provided for words which In English, often used 
In an ItaBan context, are associated to an apostrophe sign, and wh\dn do not occur in Italian. English words ending with 

4S a vocal and whk;h are more frequently toliowed by an apostrophe (e.g., "I", 'Y1e^ "she"), and whfch do not have an Italian 
accented equivalent are listed together with other Italian words, but with appropriate flags indicating that these words 
are not normally accented. Le., system 1 00 does not convert an apostrophe after these words to an accent which Is a 
default action for Italian, unless a word is known to occur with a final apostrophe. Additionally, system 100 Inckides a 
special list of words, rules with a POSTAPOSTROPIHE flag. whk:h are known to only exist after an apostrophe, and 

so which are used In English, such as: V (also used In Genrtan). "d", ir, Ne' and *em". When the user writes a word, end- 
ing in a vowel and followed by an apostrophe, and the Italian rules, possibly a suffix or fallback rule, cause the word to 
be accented by system 1 00, and then these known "post-apostrophe" words occur, and then the word ends, system 100 
restores the previously changed apostrophe. For example, considering TH go home", system 100 recognizes as a 
word whbh can be either without accent in itsilan or followed by an apostrophe in English, but never accented, not even 

55 in Italian, and leaves the apostrophe as is. 1-iad the entry for the word T not indteated that the word does not exist with 
an accent system 100 is still able to retroactively correct a change from apostrophe to accent after checking the "ir 
word, as is done for the case below. 

[0079] Considering "Gina's car is red", system 100 may Initiaily convert "Gina"' to ■GInft', assuming that no exact 



19 



EP1093 058A1 



entry for "Gin a' exists Incflcating tfiat the accented XBlngi* cioee not exist, therefore applying a generic suffix or faliback 
rule. I>ut then, after the fbiiowing non-woid chanacten system 1 00 recognizee the apostrophe 4- V pattern, and restores 
the apostrophe. 

[0080] In one particular err^odment, the data structures for Italian consist of a series of lists which ell deal, In one 

5 way or another, with apostrophe and accent Intbrmatlon. The Bate consist of five sorted lists ("A", "E", "0\ If), each 
containing rules for word and word suffixes ending with the corresponding vowel. Each entry can refer either to a word 
suffix, that Is to a group of words ending with the eame suffix, or to an exact word, and can have one or more flags. One 
optional list of words ending with a consonant is provided, but which are nevertheless frequently written followed with 
an apostrophe even if no apostrophe should be placed after the word. For example, the list contains an entry indicating 

w that the word "quar Is never to be followed by an apostrophe. In this case, as explained In the general overview, system 
100 replaces the apostrophe with a space character. These entries usuaHy only have the NOTHING flag. One list of 
replacement rules Is also provided. These can optionally be enabled to place accents inside certain foreign words. For 
example, a rule could say that if the user writes 'Cezanne', the word is automatically be converted to "C6zenne'. These 
rules, nice the rules for words ending in a consonant, complete system 1 00 in an optional way in that they cBnbe option- 

15 ally provided, and do not affect the main feature of input method editor functfonality. One optional nst of words Includes 
that are known to occur after other words separated only by an apostrophe character is optbnally provided. These 
entries include mostly particles, such as f for EngDsh and Gennan possessive forms, V, *ir, Ve". etc. This list ena- 
bles system 100 to posthumously recover from certain Incorrect changes that might have been applied as a result of 
suffix and fallback rules included In the five vowel lists. As a result of this list, system 1 00 becomes more reRabte even 

so when writing for example In English, and when ienguege detection is not possfele. 

[0081] Word and suffix entries in the lets are case, accent and apostrophe Insensitive, that is the entries produce 
notches ^norlng accent, apostrophe and case infonnatlon. This is also refen-ed to as nomiatlzed. The entries only con* 
sist of the letters "a** to V, and the dteh symbol C-*^, if surrounded by other charBctens. One asterisk charecter is used, 
at the beginning of an entry, to indicate that the entry refers to a word suffix, and not an exact word (e.g., "^che* vs. 

2S "perche"). 

[0082] The following flags and attributes are optionally associated, also in combinations, to the entries: 

NOTi-tING: this flag Indicates that the word, possibly also, exists without a final accent or a final apostrophe sign. 
GRAVE: the word, posstofy also, exists with a final grave accent 
30 ACUTE: the word, possii^ly also, exists with a final acute accerrt 

CIRCUMFLEX: the word, possibly also, exists with a final circumflex aooent 

APOSTROPHE: the word, possibly also, exists with a final apostrophe immediately foltowing the last letter 
APOSTROPHERARE: used with APOSTROPHE, meaning that use of the word with apostrophe Is very rare. The 
user may decide to set system 1 00 in a way that the word is not consWered to have APOSTROPHE if APOSTRO- 

35 PHERARE is set, whteh wouki Improve the automatic detection of certain common apostrophe and accent errors. 
iNFORMAriON=string: This Is an infomnetbn text that may be displayed as a tool tip above the cursor position, or 
elsewhere on the screen. It could say something like: This word is used with or without accent. Without accent it 
nrieans XYZ. With accent It means ABC." Usually the tool tip is displayed to Inform the user that an entry with accent 
or apostrophe is probably not what was meant. I.e., not necessarily an error, but more likely to be one than now. in 

40 itie progrsm settings, the user can decide to display different types of messages. 

COMPOUNDSTRICKY: This flag, used with words that have no accents. Indteales that compounds of that word do 
have an accent This is a confusing conditk>n for the writer, and this flag allows for a more detailed explanation to 
the user, depending on the desired level of Infonnation messages. For example, tf the user wrote nre' with an 
accent, system 100 removes the accent and display a message saying that "Unlike Its compounds, tre' is written 

45 without accent". 

TRICKYCOMPOUND=string: This string attribute Indicates the COMPOUNDSTRICKY word of whfch an entry is a 
compound, for the purpose of displaying complete information to the user, tf desired. 

ITALIANIZED (LANG UAG ECO DE>=string: This attribute and additional sUIng fields Indicate that the word is an 'Ital- 
(anized" version of the word, which in the original language is written differently. Italianlzation of words is not as fre- 

so quent tod^ as it used to l^e, and often resulted In accented words. In modern writing the original words, English, 
French, etc.. tend to bo more desirable than the old Itaiianized fonns. Appropriate program settings are, for exam- 
ple, used for automatic replacement with the desired word variant to consistently use the Italianized or the original 
forms. String attributes complete the information by Indicating the original word, specifying Ihe original language. 
CAPITAL: This attribute indicates that the word to which the rule refers always appears with a capital letter This 

55 Information Is useful to detect certain proper nouns for which speclfk: accent rules apply 

TRADEMARK: This attribute Indicates that the word is a tredemaric or registered trademarte This infomoation Is dis- 
played to the user as part of a view of all the word properties. For example, the entry for the company name *0c^* 
would be -Oce ACUTE CAPITAL TRADEMARK". 
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WEEKDAY: This ftag indicates that the word Is the name of a weekday. Weekday names from Monday to Friday are 
aooented in itallan, end represent one of the nrkore frequent instances of enx>nB In which an accent is not written, 
when instead it shouid t)e. This flag, in combination with an appropriate program option, could be used for auto- 
mata con'eetton of weekday names written without an accent 

TRICKYINSIDE; This flag Indicates that the word contains accents, but not at the end of the word. This flag Is nor- 
mally only used for non-Italian words. typlcaOy French words sometimes used in an Itailan context. Uke WEEKDAY, 
this flag allows the program to reduce the computational overhead by limiting the searph of words in a non-end-of* 
word-accent context to the words that may require attention even if written without a final sign. 
COMPLEX=stnng: This string attribute Is used to describe the accents In a usually non-Italian word when the 
attributes for final accents (GRAVE, ACUTE, CIRCUMFLEX) are not sufflclerrt. I.e., because the worti contains dJa- 
cnticdl marks inside the word, and/or at the end of the wonJ, but not of type GRAVE, ACUTE or CIRCUMFLEX. 
FALLBACK: This flag marte the last rule in the first part of the list A fafiback mle Is of type "*a', "e", "n". "V. "V, 
l.e., It Is used only In the fists essodated to words ending with the ffvs vowels, and indicates the fallback rule to 
apply when the prevk)us rules, which are parsed sequentially from top to bottom, produced no match. This flag has 
no functional purpose other than to mark the boundary between the two pans of the list, as explained below. 
POSTAPOSTROPHE: this attribute, used in a dedicated Bst, mai1<s those partides such as the English V and X 
which are written after an apostrophe. These partlcles are used to correct signs IrxsorrecUy changed to an accent, 
whtoh Is a condition that occurs when applying Italian rules to non-ltaDan words, as is the case with words that usu- 
ally precede such partk;les. At the same time, these partkdes represent, in many languages, the only cases in 
which apostrophes are used at all. Recognizing these oocun-ences increases the reliability of system 100 when 
applied to a multilingual context. 

NOTiTALIAN: this attribute is used to allow certain word entries In the rules lists to be recognized as not being Ital- 
ian %vords, 80 that they can be conskiered even if the cunwt language context is not Italian, which would normally 
disable system 100. This allows, for example, pladng the correct accent on non-Italian words such as the noun 
"Jos6', whteh might normally f^lt even a slinple Italian blgrams test Use of this flag Is optional, and also depends 
on the advantages it brings with consideration to the procedure used for language detection, ft any. 

[0063] Rags are optionally connblned, If more than one flag applies to the same suffix or word. For exanple, a word 
may exist, with different meanings, with no sign, with an apostrophe, and with an accent. Such a condition is rare, but 
exists. Great care is placed In compiling the list of rules that are part of the fists, because If a word or suffix rule becomes 
part of the list, since In one embodiment system 1 00 oonslders ad words for whx:h there might be a match, not just some 
words. In one partcular embodiment, entries with no flags have no meaning, and are not permitted. 
[0084] For the purpose of determining whether the input of a unh/ersal activator event In an Itailan context refers to 
an apostrophe, an accent, or an enx>r, In one embodiment system 100 places a higher priority on the coHectJon of data 
about words and suffixes that are often incorrectly written with a sign, and those that are written w'rth an apostrophe. 
These two cases, which are covered through word rules mther than suffix rules, are excJuded In an exhaustive way 
before focussing on accents, since accents use more sufflx rules, Including fallback suffix ruled. 
[0085] In one partteular embodiment the lists are parsed sequentially, top-to-bottom. System 1 00 is optionally mod- 
i^d to reduce the number of comparisons, using a variety of possft>fe data structures for sorting and searching, whteh 
are well known. When system 100 detennines a positive match between the cun^ent word and a rule» the search is 
ended unless a second search is necessary In the second part of the fists. 

[0086] Rules that represent exceptk>n$ to other rules appear higher in the list than the rules of whk)h they are an 
exception. Rules are also placed higher on the list based on frequency considerations. The 20 most frequentfy<used 
rules cover more than 90% of Italian accents, which facilitates using a smrtple sorted list. In a sorted list, careful place- 
ment of the rules based on frequency is one fEu:tor that improves performance. In on embodiment, rules are preceded 
by their exceptions. For example, a rule Indteating that -che", I.e., words ending with "che", can have either an acute 
accent or no accent or apostrophe, are preceded by exception entries for words such as '^icche'', which can be written 
either without any sign or with a grave accent. If a "*che* rule Is placed on top of the list for frequency or access speed 
reasons, then all of Its exceptions are placed before It as welt. 

[0087] Each of the Trve Nowel lists" contains a taflback suffix mle entry (SUFFIX flag), being the shortest possible 
suffix rule, I.e., the one vowel to which the list Itself refers (e.g.. '**a", "^e", etc.), and following all other rules, although 
additional entries may follow In a second part of the list This entry Is also marked v^th the FALLBACK flag, although It 
could, in theory, be implteitly Identified by the fact that It Is a one-character SUFFIX rule. For words ending with a vowel, 
a fellback mle Indicates the statistically best fallback descnptlon for words ending with each vowel when none of the pre- 
vious rules matched. A tellback mle typically indicates that words ending with that vowel either have no sign, or they 
have a certain type of accent For example, the fallback mle for words ending with E Indicates that those words, unless 
covered by rules appearing higher in the list, have either no sign, or they have an acute accent If for example the user 
writes an unknown word ending with *e*, and followed bf an apostrophe, system 1 00 transforms the vowel -i- apostrophe 
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combination Into the vowel wfth an acute accent fd"). rather than leaving the apostrophe, because the fallback rule had 
no APOSTROPHE flag. For this reason complete rules are provided that cover words ending wfth E that are wrttten with 
an apostrophe, or that end with a grave accent, or that are often written with a sign even if they should not, as well as 
additional safety nets to recognize apostrophes that nney In fact be closing single quotes, or English or Gem^n posses- 

5 sives, or similar particles used in combination with an apostrophe sign. 

[0088] In one particular embodinr^nt of the invention, when a rule entry contains the flag NOTHING and no other 
flag of type GRAVE, ACUTE, CIRCUMFLEX or APOSTROPHE, and the input contains an accent on the last character 
of the word, or an apostrophe after the last character, then system 100 removes that ^gn. if however the sign was a 
sign used as a single closing quote, and the context b such that a single closing quote is Indeed expected, then the sign 

10 Is not removed. Also» If after the removal of a sign equal or similar to an apostrophe, a POSTAPOSTROPHE word string 
follows, as soon as this second word string is complete (i.e., after the first non-word character following the siring), then 
system 100 retroacttvely re -corrects (I.e., further modifies a corrected word Including undoing a correction) the previous 
conrection, reinserting the apostrophe. A pecullarrty of writing truncated Itafian words ending with an apostrophe is that 
if the last character of the tnincated word » a consonant, then the apostrophe also acts as a spacing character between 

95 that word and the following one, i.e., no space character Is used between the two words. System 100 automatically 
removes an incorract apostrophe sign by taking this Into consideration, in order to place an appropriate space character 
where necessary. As described elsewhers. system 100 provides for Afferent ways to override the automatic removal or 
change of a sign, for by n\anuel editing or by a repeating Input initiating an IME loop. 

. [0069] A rule entry containing only one fteg of type GRAVE. ACUTE. CI RCUMFLEX or APOSTROPHE expresses 
20 a very dear statement about matching words, indicating that any matching word Is not written without ^gn, but only with 
a sign, and abo Indicating the exact sign. This not only eliminates ambiguities In a context such as that where a closing 
single quote is expected, and the user pressed the apostrophe key, but it is also used to place miseing 8^ns when the 
text Input stream did not contair) any special signs. This may be implemented as an aftematlve embodiment. Such an 
embodknent is optionally enabled for a selection of words commonly written without signs even when they actually need 
Z5 one, such as weekdays. In general, the single flag la used to apply the correct sign on or after a word if a sign was also 
present In the input text. 

[0080] Vy^en a rule entry contains multiple flags, of whbh one is of type NOTHING, and only one other flag is of 
type GRAVE, ACUTE, CIRCUMFLEX or APOSTROPHE, then system 100 places the appropriate sign. If an apostro- 
phe or similar character follows the word, or If an accent Is on the last vowel of the word. Optbnally, system 100 is pro- 

30 grammed to leave accented vowels, for example in Interactive mode, the result of an explfclt selection of an accented 
key opposed to the selection of the apostrophe, as they are, and only apply rules logic to the output when apostrophe 
characters, or some other spedftc activator event, appears in the input Also, as already mentioned, system 100 is 
optionally set not to apply any changes when accented characters are found inside, not at the end of, words. This con- 
dition Is detected when working on cilpt>oard or file data, and during typing where actton is deferred to when the end of 

35 the word has been reached. 

[0091] Word or suffix entries with more than one of the GRAVE, ACUTE, CIRCUMFLEX or APOSTROPHE flags 
are rare in Italian, and usually consist of either GRAVE or ACUTE and APOSTROPHE+APOSTROPHERARE. The 
default setting In one embodiment Is to Ignore APOSTROPHE+APOSTROPHERARE flags, resulting in simpler entries 
consisting of NOTHING and/or GRAVE or ACUTE. Optionally, a few entries may remain with more than one of the 

40 GRAVE, ACUTE, CIRCUMFLEX or APOSTROPHE flags, whtoh may also be combined with the NOTHING flag. In 
these cases, system 1 00 may not automatk:ally make changes to the input stream, but instead uses the flags to display 
an information or warning message in case the Input does not match any of the flags, or to place certain signs before 
others in the desired order for the IME loop. Stallstbal analysis of Italian texts has shown that the only cases In whteh 
multiple flags appear ere entries having the form: (with or without NOTHING) + (usually GRAVE, but sometimes 

45 ACUTE) + (APOSTROPHE, sometimes combined with APOSTROPHERARE), and that the entry may be left unmodi- 
fied If the input contained no accent or apostrophe, and to accept an apostrophe In the Input as the conect accent, 
which is statlstk^ily considerably more frequent than the apostrophe, even If APOSTROPHERARE is not present The 
following is a sample entry of such a multiple-flag word, which also includes an Information message that system 100 
could optionally display: 

so 

di NOTHING GRAVE APOSTROPHE INFORMAT10N="dl « preposition (as In "di piCi"); di = day: dV « you say 
(imperative)' 

[0092] In one embodiment of the invention, a POSTAPOSTROPHE condition oven-ides a previous automatte con- 
55 version of an apostrophe to an accent, or the removal of an apostrophe, even as part of an IME loop, for example with 
a user trying to write "Josh's car is red". This Is because a POSTAPOSTROPHE condition typctally Indcates a non-Ital- 
ian context in which an Italian word or suffix rule was applied Inadvertently, or in whtoh an IME loop was Initiated inad- 
vertently. 
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[0093] tt should be noted how the inclusksn in the rules of English words such as *8he', "he*, Ve*, as well as 
proper nouns, all with the NOTHING flag, as long as these words are not accented even In their possible Italian word 
equivalents, combined with POSTAPOSTROPHE entries, creates a double-banler against possble misinterpretations 
and Inconrect changes of apostrophe characters used In a "non-Italian feshlon**, as In "I'll go home*. The additional word 
5 entries are useful because the POSTAPOSTROPHE entries cause a re-con^ectton or further modification after the user 
may already have been slightly confused by a temporary incorrect change. The additional entries help to prevent these 
rare In an Italian context casea. 

[0094] Optionally, the lists of rules associated to words ending with a vowel may have a second part of entries, after 
the follback rule. In this emtx)diment, the second set of entries does not change the results that would be achieved by 

10 applying the first set. An entry In the second part conflicting with the rules appearing In the first part. Including the fall- 
back suffix rule. Is considered an error in the data structure. While the ffrst part of the list has a priority on performance, 
achieved using suf^x rules, the second part additionally specifies words, and In rare cases suffixes, that are already 
covered by suffix rules in the first part, but which In the second part are listed In detaa. Such informatkm is used to pro- 
duce more accurate results In a context where single quotes are used, or in a more professional editorlar context where 

15 unknown words should not be processed applying a generic suffix rule, but rather be double-chedced manually, as well 
as to provide additional Information to the user about an automatic change. When a word, not a suffix, rule appears In 
the first part of the list, that ¥vord is considered a positive confirmation that the word exists in Italian, and that it is written 
as Indicated by the attributes for that rule. When a suffix rule appears In the first part of the list, then that is treated as 
a generic rule, and not as exhaustive infbmiation. Thus, the NOTHING flag is used on suffbc rules that appear In the first 

20 part of the fist. For example, a suffix rule describing words ending with •che" has both flags for NOTHING, and for 
ACUTE. If the user wrote •atnnchs*. system 1 00 conrectty outputs •sffinch^-. However, If the user wrote •afflnche"' In a 
context were a pending closing single quote was detected, system 100 detemnlnes whether ft would be best to treat the 
apostrophe as a closing single quote, or rather as an accent. Different fallt^adc behaviors are defined for these condi- 
tions, including the display of a warning message, and access to statistical data about the IDcelihood of a closing quote 

2S at a certain distance fim the opening quote. In addition to or In lieu of the frequency of an accent on an unknown word, 
tf the second and longer second part of the list, which is accessed In these more ambiguous cases, included an entry 
for *affinche" having only the ACUTE fiag, then system 1 00 detemilnes that the word "affinch^" exists, and that the word 
is always written with an accent, and therefore the apostrophe character was intended as an accent for 'affirv^^". The 
input of a second apostrophe character, should or^e follow, is Interpreted as a closing quote 

30 [0095] The second part of the list defines in detail, with separate word entries, what should already be included try 
suffbc rules In the first part, but which these sufTix rules do not completely describe and at Ifie same time fimit In an 
exhaustive and complete way. In addition to the cases described above such as single quote context, editorial context, 
another possible application of the entries in the second part is changing the color of a status Indicator from one color 
indicating that a suffix rule Is applied, for exanr^le yellow, to a different color for example green when an exact word 

3S match was found. Also, whereas suffix rules are more open, generally Including the NOTHING flag to consider for pos* 
s1t>le unaccented words matching that rule, exact word entries need not do the same unless an exact word can be writ- 
ten both with and without accent, making it possible to automatically add accents and apostrophe characters even If the 
user placed no sign. Except for very specffic cases, such as weekdays and a few other comnnon mistakes, such auto- 
matic behavior is not be utilized in one particular embodiment, for example because Italian still has many words that 

40 can be written either with or without a slgn^ whteh Is not partlcuiariy conducive to safe automatic action, although auto- 
matic action can still be optionally utilized, 

[0096] To further clarify, where system 1 00 includes vowel-Iists divided in two parts, the first part is designed in such 
a way that con^ output is generated when the user enters, fbr example, the apostrophe key after a word. The second 
part of the list provides additional certainty, which Is normally not required. The flags for the exact word entries In the 

45 second part of the list match or are a subset of the flags for the matching suffix rule in the first part. 

[0097] In rare cases it is possible to also use suffix rules as opposed to word rules in the second part of the list. If 
they are used, then the entries are treated as being as "authoritative'' as exact word entries. One case in which a suffix 
rule may be used In the second part of the list, for example, is for suffixes such as 'VentitrS'. which means *twenty- 
three'. and whk:h can be appended to an unlimited combination of other numbers. However, the suffix rule "'ventitre' 

so has a string which makes It detailed and precise enough so as to be particularly appncable to words n^eanlng numt>ers, 
it can safely be used without the NOTHING flag, to mean that all words ending with 'Ventitre" is accented. 
[0096] Word entries have the same meaning and are treated the same way both in the first and the second part of 
a list, and do not need to be repeated twioe. Suffix rules are authoritative meaning that they give sufficient certainty 
about most or all matching words even in certain unusual circumstances such as single quote contexts. Word entries in 

55 the second part of the list generally do not conflict whh suffix entries in the first part where they have either the same 
flags or a subset thereof. 

[0099] Depending on the Implementation, it Is possible to optionally not include entries for the type of rules 
described as belonging to the second part. Conversely, on a sufficiently fast system, or using different data structures, 
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the two ItetB are capable of being merged into a single fist Separation of the lists maintains the first Dst as short as pos- 
sible, yet where caPed for In exceptional cases, additional and more exhaustive data in the second M may be 
accessed. 

Sample Implementation: Processing of Italian Text Input 

[0100] In one embodiment of system 1 00, system 1 00 Is optimized for ItaRan wrftlng. In one particular ennbodtment, 
system 100 is Interfaced with a MICROSOFT WINDOWS operating system 1 26, available from MICROSOFT CORPO- 
RATION, as an input hock wherein system 100 has access to keyboard 112 and mouse events as they occur, end fur- 
ther provkiss the ablDty to simulate keyboard Input Ind^endent of user actuated keyboard input. For example. In a 
particular embodiment, where applkjation 130 cumently processing Input Is MICROSOFT WORD, then system 100 
uses specific functbns, documented by MICROSOFT, to get current context and language Information from MICRO- 
SOFT WORD. Similar interfiaces could be used for other programs and for other operating system such as LINUX avail- 
able from multiple sources Including RED HAT. fNC. MAC X available from APPLE COMPUTER. INC., etc, where 
available. If eppiteatlon 130 Is not MICROSOFT WORD, or another application pro>^dlng access to Information such as 
text context and language, then Interfaces of operating system 126 such as those provfded by MICROSOFT WINDOWS 
are used to collect text context, language cmd current cursor position Information. For example, the Interface specifica- 
tions for MK^ROSOFT WINDOWS Include: Active Accessibility and Input Method Edtor (IME). Such operating system- 
wide Interlaces only give meaningful results wtien the user is writing using an appHcation that provides such data to the 
opemting system, which in turn can then pass It to an appDcatton such as system 1 00. For example, in order to use IME 
functk>nal1ty, the a^^ncation used for writing must be IME-aware. Where the applk»tk)n supports no Interface to provkte 
text context and language data, system 100 described here can still obtain current word context from the local copy of 
the context which is detennined from the keyboard input stream, but whfch may be lost In cases such as vertical cursor 
movement, use of the nrK>us6 to move the cursor, or selection of a command, either via the menu or via a a Keyboard 
Accelerator combination of keys. To reconstruct context In this case, it Is still possible to use system 100 functions to try 
and determine at least the text cursor, also called caret, a blinking fine, block, or bitmap in the client area of a window. 
The caret typcally Indicates the pface at whfch text or graphics will be Inserted, If it Is not posstole to determine the cur- 
sor position through system functions, system 1 00 analyzes the display memory to detect a small flashing object. Once 
the cursor position has been determined. In the case of a bitmapped display, system 100 appRes optk;al character rec- 
ognition (OCR) algorithms 122 to the bitmap contents, with particular focus on the letters to the left of the cursor. OCR 
122 is particularly effective on screen bitmaps, because the character data is both clean Le., not rotated or disturbed by 
printing or medium Imperfections, and because system fur)ctk>ns are used to determine the fonts that are currently !n 
use thereby facilitating the OCR process. Since the possble fonts are known, the font possibilities are limited to a par* 
ticuiar ftst If the display Is not bitmap-based, but character-based, then system 100 extracts text directly from display 
memory In most embodiments, text context is not required to be detenmlned immediately after it Is lost toitowing a cur- 
sor retocatron. Even where context Informafion Is deslmt)le, ibr example where the user moves the cursor and immedi- 
ately afterwards presses the apostrophe key to edit an Italian word, OCR 122 or other context analysis routines only 
need to succeed In obtaining cun'ent word. Even wehre only a few characters before the cunent cursors can be 
obtained, this Is sufficient to apply Italian rules whk^h In most cases are suffix-based, the ending of words is determinant 
of accentJnformatlon, whbh Is where system 1 00 typk;ally works. For example. In Italian the whole accent and apostro- 
phe issue Is typk^aily primarily detennined by the suffix of the word. Additionally, k>ecause of the statlstteaily unlikely 
chain of events required to occur in order to result in a possible failure, system 100 functions reliably even when context 
Is temporarily lost as part of normal writing use. On-screen OCR Is most likely to succeed on the chaiBCters that are 
determinative, i.e., the current word or at least a pert containing the rele\^nt suffix of the cun^ent word, t)ecause these 
parts are most likely to be visible immediately before the cursor position, rather than being located on a diffierBnt fine, or 
covered by another window. OCR analysis Is opttonaily dosefy coupled wiU) an Italian rules parser u^k:h stops the ana- 
lyzing of text right-to- left In the event a positive suffix-rule match occurs. 

[0101] If system 100 detemiines that the text stream needs to be modified, for example to replace a vowel and an 
apostrophe with an accented vowel, system adds artificially generated information to the output stream, generating 
characters such as a backspace key input followed by an accented character. Where appfication 130 or operating sys- 
tem 128 support this, In a partk:ular embodiment one string Is directly replaced with another one without requiring sim- 
ulation the progressive deletion of the old string. In either case, the kx:al context buffer Is also updated accordingly 
[0102] In one embodiment of the Invention where Italian is addressed, the apostrophe character, and similar char- 
acters that may be present on a Iceyboard or character set, as well as all accented vowels, are utilized as activator event 
sequences since it has been detemilned that utilization such activator events provides a method for providing correctly 
accented and punctuated words in a manner that is intuitive to a user of the Italian language, in utilizing the apostrophe 
chanacter, system 100 avoids interpretation errors by system 100 without requiring a lager sized dictionary. System 100 
processes Italian end-of-word conditions such as accent, apostrophe, and no sign, using general suffix rules, which are 
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capable of being Implemented in using a Omited word dictionary or even no word dictionary. Regardless of the size of 
the word dictionary utilized by system 1 00, the user may enter a new word, or may start writing In another language, or 
may write an Italian word that may exist both with and without a sign on or after the last vowel. In such cases* system 
1 00 estimates the most likely Intention of the user for press^g the apostrophe key If system 1 00 finds no specific likely 

5 reason for the apostrophe, other than it being part of the word, then the word suffix rules are applied, otherwise the 
apostrophe is left as Is, and Is not treated as an activator event For these reasons. In one embodiment a list of the most 
frequent words with an apostrophe or accent is provided, as well as words whfeh commonly are mistakenly vi^ritten with 
such a sign, regardless of whether system 1 00 has a suffix rule that already would produce accurate resuft for this word. 
For example, a sufTtx rule saying that the s^n nomialiy used on words '**chd", l.e., all words ending with "che", is an 

10 acute accent as In "perch6", wouM work very well If the user wrote: 

perche' 

at the beginning of a sentence. The word Is automatically converted to: 
peich^ 

Where system 100 encountered' 
so 'perche' 

system 1 0O determines to convert the input to: 
'perch6 

25 

if the complete word "perch^" was found In the dictionary, and the data In the dictionary made It dear that the word only 
existed with an acute accent If the word was unknown, or was such that It could be written both with and without accent, 
usually Indicating two different meanings, then system 100 leaves the apostrophe unmodified, assuming It was a ck>s- 
Ing single quote. Different default behaviors for system 100 are optionally set 
30 [0103] in the previous example, I.e., in a system containing an exact word entry for '^erchd" when the user presses 
the apostrophe activator event for the second time, system 1 00 produces: 

'perchd' 

35 such that system 1 00 recognizes both the pending closing single quote condition, and the accented word, the IME loop 
places the vowel -i- acute accent + apostrophe combination in a second position after the first press of the apostrophe, 
resulting in vowel -i- acute accent, resuiting in an intuitive and efficient input sequence for this partteular context. 
[0104] In one embodiment of the Invention, two m^or aspects differentiate the default behavior of sysltem 1 00 when 
applied In an interactive context such as keyboard input, compared to a non-interacth^ stream such as file or clipboard 

40 data; 

1. Accents Inskie words are not corrected in non-interactive mode; 

2. Repeated activator events e.g., apostrophe characters, do not Initiate IME loops \n non-interactive modes. 

45 In an alternative embodiment of system 100, three aspects differentiate the default behavior of system 100 when 
applied In an interactive context such as Iceyboard input as compared to a non-interactive stream, such as file or clip- 
board data: 

1 . Accents inskie words are not connected in non-Interactive mode: 
so 2. Repeated acth^ator events (e.g. apostrophe characters) do not Initiate IME loops In non*interactive modes; 

3. Spacing characters are not automatbally inserted as part of the automatic processing of accent and apostrophe 
characters In non-interactive mode. 

The differences describe the addrtlonat control provided by interactive mode, which Is typically not available when work- 
55 ing on a non-interactive input stream, although the additional control may be opttonafly utilized in a non-interactive mode 
if desired. For example. If the present system were applied with the purpose of converting 8-bit Italian text data to 7-bit 
data such as a plain ASCII character set. apostrophe characters, which are part of the ASCII set, could be used In the 
text in place of accents, whfch are generally not part of tiie ASCII set. This results a text with simple apostrophe char- 
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Bcters instead ot more compfex control sequences, that are readable by humans, and whbh could be processed by sys- 
tem 100 for re-ccnverslon to 8-tolt data In some Instances, where during conversion to 7-blt data system lOO detects 
that the output of a single apostrophe character would be such that re-convarsion to d-bit data would produce a resuft 
different from the od^^nal, system 1 00 outputs multiple apostrophe characters. In which case functtonallty comparable 

5 to Interactive IME loops is provided In non-Interactive contexts. 

[0105] System 100 provides an option to Indicate whether accents should be placed on upper case letters. In one 
embodiment, the default for ItaOan is Yes. If the setting were No, as with some languages such as French and for certain 
editorial styles, or>e setdng provides all upper case letters with no sign, regardless of their position in the word. In one 
embodiment system 100 remove accent signs and appends en apostrophe to the end of the word If an accent was 

10 removed from the last vowel of the word. System 100 provides a sfrnllar option to Indicate whether accents should be 
placed on a lower case letter. An application of this option Is, when set to No In combination with the option to not place 
accents on uppercase letters, to create a pure 7-blt text System 100 also provides settings to change the apostrqshe 
chaiBcter which Is appended at the end of words vtfhen an accent was removed as a result of a setting indicating not to 
place accents on upper or lower case letters. By default in one embodiment, the apostrophe character Is used, but dif- 

f 5 ferent characters are optionaDy used, for example the grave and the acute accent charactefi depending on whether the 
accent removed from the vowel at the end of the word was grave or acute. 

[0106] An additional option of system 1 00 detemiines whether foreign words appearing in the word lists (HALIAN- 
IZED flag) should be left as written, or replaced with the original non- Italian word. The default setting In one embodiment 
Is to leave the words as written. Another option of system 1 00 is replacemerrt rules for words with a COMPL£Xs(8trlng> 

20 attribute, where any matching word would be replaced with the (string) (e.g.. "Cezanne" would become *C6zanne7 This 
option Is enabled by default In one embodiment, and is helpful to properly write certain non-Italian words, usually French 
words used in Itatian. Yet another option of system 100 determines whether end-of-word accents or apostrophe char- 
acters may be auton^atically added to words even If the Input stream contained no sign at all. This option Is disabled by 
default, as already explained. One possible setting Is to enable this option only for weekdays (WEEKDAY flag) or for 

25 words with the TRICKYCOMPOUNO attribute. Another setting enables the option for all wonjs which have no NOTH- 
ING flag, and only one of GRAVE, ACUTE, CIRCUMFLEX or APOSTROPHE. 

[0107] Another option allows the user set the level of verbosenass during interactive mode, le., the frequency at 
which tool tips are opened above the cursor to display Information. The default setting In one embodiment Is to display 
Infomnation relating to words which may be written with more than one sign, e.g.. with an apostrophe, or with an accent, 
90 or with no sign, and which have different meanings depending on the sign that the user may decide to use. Tool tips are 
also displayed tiy default In one embodiment when an IV1E loop is in progress. 

[0108] An option of system 1 00 allows for the normafization of the apostrophe character, I.e., if the input contains a 
character that Is similar, but not Identlcat, to the apostrophe character, then the input character Is processed as a user- 
desired standard apostrophe character. For example, many l(ey boards corUaln a grave character, which is often used 

35 Instead of the apostrophe, to which It is visually very similar, and possibly resulting in inconsistent use in the text This 
option ensures that the text contains the same apostrophe charEK:ter. An add'rtional prcgnom option is assocrated to the 
spacing apostrophe as described herein. In one embodiment, apostrophe characters as well as similar equivalent char> 
Bcters are interpr^ed as opening slngle^quotes If they are Immediately foflowed by a letter or digit, and not Immediately 
preceded t)y a letter or digit. Similarly, In order to be recognized as such, closing quotes must be immediately preceded 

40 by a letter, digit or punctuation sign, and not be followed by a letter or digit. Quote characters foflowed by POSIAPOS- 
TROPHE strings, or by two digits (as in "the summer of '90^, do not count as either opening or closing quotes. Other 
techniques to recognize opening and closing quotes are utilized, for example simply requiring a single opening quote 
character to be preceded by a space, new line or beginning of text, and treating every other quote as a closing quote. 
All of these techniques are optionally used. Double single opening and closing quotes are recognized in a similar way. 

45 For certain languages, it may be necessary to treat single or double-comma characters. Inmadlately followed by a letter 
or digit and not immediately preceded by a fetter or digit as If they were quote characters used as opening quotes as 
in This is an ^example" used in German". After a single or double opening quote has been identified, system 1 00 sets 
a corresponding flag that is cleared only after a corresponding closing single or double quote. In order to avoid leaving 
the flag set by mistake, e.g., after a single quote that had incorrectly been identified as an opening quote, the flag is 

so cleared after a certain numt)8r of characters, words or sentences. Optionally, system 1 00 conteuns a tatiie indicating, for 
varying character, word or sentence distances from an opening single or doubie-slngle quote recognized with a certain 
technique, the statistical nketihood, based on previously analyzed real-text data, that a closing item appear in that rela- 
tive position, for example, after 100 characters from an opening quote, the ilkelihood that the following character is a 
closing quote is detennlned to be 1.31%. If necessary, this data Is compared with a threshold below whk^h the single 

55 quote mode flag Is cleared, or the data is compared with statistical data about the likelihood of an apostrophe character 
after an unknown word being an apostrophe or accent or being unrelated to the word. In certain environments, such 
as for exampfe those requiring higher editorial standards, vark)us contexts In which single quotes are encountered 
could be flagged as warnings fbr user inspection without relying on automatic processing. 
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[0109] Optionany. system 100 issues a warning whenever the single quote mode flag is automatlcalty cleared 
because the distance from the opening quote Is determined to be excessive, or because the end of the text Is reached. 
Also, a warning is issued If an opening single quote is encountared within a context, which already Is In single quote 
mode, If a quote-depth-counter Is used Instead of a simple flag, a warning Is issued If. at the end of a document, or after 
a certain distance from the last opening quote, the numbers of opening and closing quotes do not match. 

Interactive Mode: Desired IliffE Loops for Italian 

[01 1 0] The action wherein system 100, based on a sequence of Input events, produces different text outputs, one 
replacing the other. Is herein referred to as an Input Method Editor (iME) loop, The contents and arrangement of the 
possible outputs, through which system 1 00 loops, can change depending on the application, for example writing ol ttal- 
ian or another language, or the Input of cumency symbols, etc., user settings, and data collected during previous IME 
loops. For Itafian, In one embodiment the purpose of the default IME loops is to allow the user to cycle through all the 
possible accent and apostrophe combinations. This includes all possible accents, the apostrophe, and the letter without 
sign. The user Is also allowed to write an accemed letter followed by an apostrophe or single quote. t=6r this reason this 
combination of characters ts also optionally part of the IME loop. Olffarent variants are possNe: the IME loop for each 
vowel for example in one embodiment includes all combinations of accents followed by an apostrophe, or the correct 
accent as determined by the rules, followed by the apostrophe, in one embodiment, bafancing these two considera- 
tions, the IME loop contains the second case because it Is less lilcely that the user writes a word on which system 1 00 
would place an incorrect accent, and that the same word also ts followed by a single quote or iq[>ostrophe. Even such 
an unusual input can be processed by system 100. Typically, in one embodiment IME loops are used in interacSve mode 
l.e., during keyboard input, rather than in file and cftpboard operations. 

[0111] In general, In one embodiment an IME loop tor Italian Is initiated and used with the same keyboard key that 
also serves as an activator event for the automatic placement of the con^ect sign at the end of Italian words. In one 
embodiment the activator event is the apostrophe key. or the g rave key, or any accented letter key. When an apostrophe 
or grave key is pressed once after a letter, or when an accented vowel key is pressed, system 1 00 parses through the 
rules and outputs the character or character comblnatkm determined to be correct, for example a letter with no sign, a 
letter followed by an apostrophe, or a vowel wfth an accent, or a letter followed by a space. Thus, the first output is likely 
and nominally a more correct output The IME loop allows for a different output selectatrie t^y the user In an Intuitive 
manner, whk;h in one embodiment occurs by pressing or actuation of the acth^ator event, accented vowel, or apostro- 
phe, or acute, again and In succession, in one embodiment, settings and implementation opttons limit both the possible 
key or keys that are recognized as acthmtor events, for example to use accented letters for manual input, and the apos- 
trophe for automatb Input, as well the keys that can activate an IME loop, if at all. 

[0112] An additional optton provided by system 100 detennines the behavior when different activator events are 
enabled, and when such different Keys are pressed one after the other. For example, when the user enters the following 
6 keys: 

P-e-r-c-h-e-6-* 

the above can be considered an IME loop, equivalent with: 
P-e-r-c-h-e-'-'-' 

and 

P-e-r-c-h-6-e-6 

[0113] in one embodiment, a default impiementation for the Input of Italian text, an IME loop is only Initiated when 
the same key is pressed more than once, providing a more rigid and predk^tabfe system for the user, leaving out the 
different sequences for the manual handling of exceptions. Different options of system 100 may account for different 
behaviors if desired. Also, if the above first example is set not to initiate an IME loop, each of the last three characters 
can be conskiered an activator event, causing system 100 to apply Its usual rules to each character. In one embodi- 
ment, this is the case. In one partk:ular embodiment If the activator event is an apostrophe, the previous context letter 
Is considered as If it were written without any accents, i.e.. the apostrophe when pressed for the first tinr>e causes the 
correct sign, accent or apostrophe or space character or no sign, to be placed by system 100. This also applies to the 
case in which the user moves the cursor immediately after an existing word in a docunrmnt, rather than writing the word 
or part thereof, and then presses the apostrophe key or the acute key depending on what activator event Is enabled. 
Thus, in one embodiment if the user moves the cursor {mmediately after a word that already has the correct accent on 
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the last letter, and presses the epostrophe key, then to the user that input has no effect, other than confirming that the 
existing text is already correct Successh/e, repeated input of the same apostrophe key wouM initiate an IME loop. A 
different Implementation or option allows the IME loop to work In such a way that when the user writes multiple but dif- 
ferent activator events one after the other, these all contribute to the activation of the same IME loop. 
5 [0114] Another Impiementfltion or program option could affect system 1 00 behavior In such a way that If the apos- 
trophe or acute character, or any other character considered similar, is pressed after an accented tetter, then that tetter 
Is not considered without accent, as In one embodiment, but rather, an IME loop is immediately InMated, causing the 
apostrophe to change the accent to the next step In the IME loop, rather than starting from the rule placing the correct 
sign. 

10 [01 15] In one embodiment the rule for the user Is predictable, for example the first press of an activator event results 
In a con-ect sign. In a particular embodiment, the second press, which initiates the IME loop, produces a character that 
Is always the same, and in an altematlve emt>odlment where a single closing quote is expected it places the accent fol- 
iowed by the apostrophe In the Immediately next posltton of the toop. Also. If the user repeatedly, or even |U5t once, used 
the IME loop functionality to change a sign after a certain word, in one embodiment system 100 rentembere this and 

75 automattoaliy adds an appropriate entry to Its rules, or alternatively displays a message, either Instantly or when the 
user asks to view a list of words that were manually changed, proposing to do so such that that when the user again 
writes the same word, the chosen sign is produced as a first result Also, the order in whk:h different accented chamc- 
tens appear In the IME loop Is optionally the same for each letter, and alternatively the flags that lndk»te the possible 
accents or apostrophe combinations for that word are considered In order to place the known posslbiHtfes for that word 

20 first In the list All of these variants are optionally Implemented by system 1 00. 

[0116] In one embodiment, the IME loop be^ns after an activator event is pressed for the second time In a row. A 
first time, system 1 00 considers its rules to place the conect sign. The second and following times, ottier characters ^e 
produced, and when all the steps of the cyde have been exhausted, the loop begins again as If the key were pressed 
for the first time. When the key Is pressed for the second time, a tool tip or small lnfonnatk>n window appears over the 

25 cursor position, vrlth a message such as 'Press again for: {chok;e 1), (choice 2)» . . . <^olce ny, indicating the order in 
whteh the following characters would appear, with the next IME step due i^pearing first For example, after the user 
writes 't^dich^' using the apostrophe after the "e" to produce the last accented tetter, and then presses the apostrophe 
for a seoond time, the word Is changed to '^erch^", and the tool tip says ''Press again fbr ^ ^' e* 6 6". tf the user chooses 
to display the tool tips, and not only In cases such as when the IME loop was actually used as a result of repeated press- 

30 ing of the activator event, then the tool tip would have t^een displayed immediately at the first press of the apostrophe 
key after TDerche", which resulted In •perch6'^: "Press again for: fe 6 6' e* 6*. 

[0117] Very different possbilities exist forthe exact implementation of the IME sequence. In one embodiment stable 
predictability la optimized with consideration for a sequence that when applied to vowels results in a predetermined 
order, In the vowel with grave accent, acute accent, circumflex accent, the inltlai connect accent foik>wed by an apostro- 

3S phe Intended as a possible ck>sing quote, and the vowel without accent folbwed by an apostrophe. For words that are 
knovm to exist with a final epostrophe Indicated with the APOSTROPHE flag in the rules, after the grave, acute and cir- 
cumflex steps, the IME loop additionally Includes the vowel followed by two apostrophe characters, one as part of the 
word, and one as a possible closing quote. For consonants after which system 100 automattoally removes the epostro- 
phe, e.g., after *quar, the loop consists of two steps, te., the letter followed ty an apostrophe, and the letter folk>wed 

40 by a space character. The sequences descrQ>ed herein In one embodiment begins with the con-ect sign , whk^ Is placed 
automatteally by system 1 00 when the user presses the activator event for the first time. After the other options are out- 
put as part of the loop, the loop continues again with the correct output etc. When the first output causes the renwval 
of an apostiophe, then the letter foltowed t>y the apostrophe appears in following positbn in the IME loop. 
[0118] Exan^les of IME kx)ps where the first output is rule-based, I.e., con-ect, and In this example is activated with 

45 a first press of the apostrophe key; last output is identical wrth first, and Indk^ates whers the loop begins again, include: 

perche + ' = perche -> perch e perche — > perche* -> perche' [repeat from ''perch6T 
e-K*=6->e->d^6*->e*-* {repeat from "feT 
po + ' i= po' ^ p6 p6 -> p6 po" -» [repeat from "poT 
so omlcldi + * = omicidl omlddl -> omicldl omiddT omicidi' -* [repeat from *omk:ldi^ 
qual + • = qual + SPACE -> quar -> [repeat from "qual+SPACEH 
qui ' = quilapostrophe removed] qui' -> qui -> qui qui (repeat from *qui"] 

For upper case letters the output is identical, but In upper case. 
55 (01 1 9] If the activator event is an accented letter Instead of, for example, an apostrophe, the output Is slightly differ- 
ent because the steps of the IME loop do not Include entries with an apostrophe in such an embodiment unless the 
rules for the cun-ent word Indicate that the word is known to system 1 00 to exist with an apostrophe. There are no ambi- 
guities about possible closing quotes and other non-word apostrophe characters as such characters would be entered 
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using the apostrophe key rather than using an accented letter. If the initial automatic change of systenn 100 transforms 
the accented letter to a latter followed by an apostrophe, or to the letter without any sign at all, then that Initial correct 
output is at the end, a new beginning of the \MB loop. For example: 

5 perch + 6 = perch d perch6 perchfi -* [repeat from "JjerchdT 
6 = [repeat from "AT 

p + d = po'->p6-^p<5->p6-> [repeat from "po"*] 
omldd 4- ) = omIcldT -> omicldi -> omlddf -> (repeat from "omicldr) 
qu ) = qui{accent removed] qui quf -» qui [repeat from "quH 

JO 

[0120] If the activator event Is a repeated unaccented vowel Instead of an apostrophe or an accented letter, the out- 
put is slightly different, and based on the rules for the cun-ent word, the IME loop begins either with a double vowel tf the 
word is known to exist with a final double vowel, or with an appropriate accent or apostrophe. For example: 

IS perche + e = pefch^ -> perchn -» perchA -> perchee ~> [repeat from perch 
po ••■ o = po' PP p,-. p± poo (repeat from po'J 
zl + 1 = zil -> zS zP zT 2 V -> [repeat from zli] 
qui I s qul[no accent] qu£ -> quP -> quT qud [repeat from qui] 

20 [0121] Different Implementations of system 1 00 era capable of providing dlfrerent sequences, for example allowing 
for all the possible connbinatlons of accents followed by the apostrophe, rather than only the conBct accent. Afterna- 
tiveiy, system 100 is designed to include a subset of a given implementation considering that In Italian the circumflex 
accent is generally used only on the letter I, and the acute accent is generally only used with E and O. The Implemen- 
tation described herein accommodates the Input of foreign words, for example non^ltallan words, resuldng In loops that 

25 are identical apart from the Initial output which is context-based, and therefore more predictable, which Is a more Intui- 
tive system for the user regardless of the letter 

[0122] Another posstole implementation of system 1 00 considers the ACUTE, GRAVE, CIRCUMFLEX and APOS- 
TROPHE flags associated with the rules entry that produced a match for the cun^nt word. Depending on the flag, the 
combinations that had no matching flag are excluded from the IME loop, or alternatively appear at the end. Another 
3o embodime nt of system 1 00 takes into consideration whether the current word is written in a context where a closing sin- 
gle or double-single quote is expected, and adds these options to the loop, optionally to the beginning of the toop in the 
event the activator event was the apostrophe. 

[0123] An alternative enr^odiment of system 100 is substantially similar to the current Implementation of the 
present invention except that If the activator event is the apostrophe and the rules Tor the cunrent word confirm that the 

35 word Is certainly written with a certain accent or apostrophe with no ambiguity or the possibility for the word to be written 
without sign, and the context is such that a single closing quote is expected, then a first press of the apostrophe pro- 
duces the connect sign after the word, and the second press adds the closing quote. Similarly, in a context where double- 
single quotes are used instead of single quotes, a third press adds the second closing quote. Afterwards, the IME loop 
continues with the other signs. Another alternattve embodiment leams from prevtous user choices, and proposes an 

4o IME loop where the most frequent previous choices appear first The data Is associated to Individual words, or grouped 
by letter (A, E, I, O, U, consonants). Another alternative embodiment considers the final selection resulting from the use 
of the IME loop, and adds an appropriate entry in the rules, so that a following time the word Is written with the same 
activator event, or optionally even with another activator event, the first output without even waiting lor the IME loop is 
the one previously chosen through the IME loop. Different variations or program settings make it possible to make the 

45 record temporary or permanent, and automatk: or based on user action. An adcfltional alterative embodiment allows for 
the new rule to be automatically recorded for a word after the user wrote a word In a certain way thereby overriding the 
initial default output of system 100 for a certain number of fames, and optionally without ever accepting the default output 
for that word. In a further alternative embodiment, a POSTAPOSTROPHE condition retroactively ovenides a previous 
automatic conversion of an apostrophe to an accent, or the removal of an apostrophe, even as part of an IME loop, for 

so exempted where the user is trying to write "Jos^^ car is red"). This Is because a POSTAPOSTROPHE condition typt- 
caliy Indicates a non-Italian context In whteh an Italian word or suffix rule is applied by mistake, or in which an fME loop 
Is initiated by mistake. The above alternative embodiment Is optionally extended to activate a re-correctton or further 
modificalion after a POSTAPOSTROPHE string, and also in general whenever the activator event is followed by a letter 
thereby placing the prevtous output in the middle of a word rather than at the end of it. This useful for example for lon- 

55 guages where it is of advantage to ^e priority to correct an unmodified input of the apostrophe sign. 
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Additional Considerations for Italian 

[0124] TTie addition ol letting system 1 00 add spacing as appropriate brings aliows the use of an activator event or 
character to be similar to or equivalent to a singto key prass. The particular procedure is applied for automatically Insert- 
5 Ing a space but without tha requiring applying logk: to place the correct accent or apostrophe. As a result, system 100 
provides a reduction of key presses conrpared to traditional Input. The foik>wing are examples thereof: 

Example - "naditionat input L'alba ^ bells (14 key presses, requires appropriate keytxtard and writing knowledge) 
Example - Intermediate system: L'alba e' bella (15 key presses, easy input) 
10 Example • Extended system: L*aft>a e'bella (14 key presses, easy Input) 

Example - Possible option/variant to 3.: L'alba 6bella (13 key presses, easy input using both apostrophe and any 
accented key, whk:h is automatk^lty corrected If necessary) 

Example - Fossae variant of 1.: L'alba ^bella (Id key presses, requires appropriate keyboard and writing knowl- 
edge) 

IS In all cases the output wouM be L'alba 6 bella 

[0125] An extended system embocRment accepts the Input of a space key after the apostrophe. When the space 
key is pressed, or if the vowel-apostrophe combination Is followed by a punctuation character, no space Is inserted. The 
automatic space insertk>n occurs or Is conflnned If the apostrophe is folk>W8d ty a tetter, number or graphteal dgn that 

20 if It occurred alone as part of a sentence wouM require a space character befora tt. 

[0126] For certain characters which nonrialiy occur In pairs, such as parentheses, brackets, single quotes, double 
quotes and other characters, no space character Is normally pieced before the closing item. When these characters are 
graphically different, such as is the case for the *f and 7* parentheses, system 100 determines whether they require a 
space before them or not In the event that they appear after an apostrophe input that is processed in such a way as to 

25 pos^bly require the automatk; Insertion of a space character. Some characters can also be written as graphically klen- 
tical signs, and based on the context they are Interpreted either as opening or as closing items. These characters 
Include the double quote and the simple quote charecter, which in general Is the sanne character as the apostrophe. For 
example, the user writes: 

30 6"p . . . 

In the above case, system 100 adds a space after the accented letter if the quote character is an opening quote. If the 
type of the quote character whether It Is an opening or closing quote Is itself determined by the presence of a space or 

a letter before it, then no result Is determined and the context remains ambiguous. System 100 therefore keeps track of 
35 opening and closing double quotes as system 100 already does for single quotes, or alternatively rn such a case the 
automatic insertion of the space character defen-ed until the user enters an addlttonal character after the quote char- 
acter. In such an emtK>diment, the type of quote is determined not by the characters before it but by the text foiiowing 
it. A closing double quote is typically not immediately foik>wed by a letter or number, but Instead Is followed by a spacing 
or punctuation sign. By applying such a detection rule or an equivalent one. the type of dout}Je quote is deteonined, and 
^ if the quote isJdentified as en opening double quote, a space is retroactively Inserted between the accented cherscter 
or character with epostrophe and the opening doubie quote. 

{0127] The specific output of different combinations of an accented vowel followed by a ck)sing single quote, or of 
a vowel followed by an apostrophe folk)wed by an klentfcal sign whbh is a closing quote, or of an unaccented vowel 
folk}wed by a dosing quote, whk;h are r^re but exist, are all possible with system 10O described herein by repeated 
45 pressing of the ai3ostrophe key until the desired combinatton Is output such as with an IME toop whtoh includes not only 
accent variations but also accent and apostrophe combinations. 

[0128] In French and according to some editorial guidelines, the set of characters that are preceded by a space 
includes punctuation eigne which are composed of more than one graphlcaJ mark such as •?". T, and ":". In ItaRan 
these punctuation signs nonnally have no space before them. In one partk^ular embodiment of system 100, the space 

so ^aracter is not automatically inserted immediately after the apostrophe ts processed but after the character after the 
apostrophe is input by the user. In ttie event where system 1 00 produces incorrect output which results either in a miss* 
ing space character or in an undesired space character^ the user can go back one ctiaracter using the backspace or 
cursor left key for example, and respectively either add a space or continue writing. Such an embodiment of system 100 
includes the optbn not to re-correct or to further modify, or to alternately re-correct or further modify after the user over- 

55 rides an automatfc correction. 

Sample input: la ltberta*(e'perche*) 
ResuKing output la liberty perch6) 
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(Spaces automatically added before words and other non-punctuatton eigne) 

Sample Input: la Itoerta'; e^perche'" 
Resulting output la iibertft; 6 *perchd" 

5 

(No space automatically added between apostrophe and punctuation or closing quote, but added before opening quote) 
[0129] In some embodiments of system 100, there are special cases In whteh even in an Italian context the apos- 
trophe stgn after a vowel Is not Immediately followed by a space but by a tetter, This Is the case for example with English 
possesslves and other patterns e.g., Yd tiy" which in an Italian context ana usually related to either English or German, 
10 and are solved with appropriate POSTAPOSTROPHE entries. Thus, the additional step described ^ve. nke other 
parts of system 100, optionally removes any space characters ft autonr«tteally added if they are followed by a POSTA- 
POSTROPHE string. 

Sample Implementation: Processing of Qerman Text Input 

15 

[0130] German uses the special characters •d", 'O". "G' in tower case, and 'A", ir In upper case CB"* 
becomes "SB" In upper case). This Is a total of 7 special characters compared to basic Iteiian's 1 2 characters. The sign 
on top of the vowels Is called an umlaut Where these characters are not available on the keyboard, character set or 
output system, the traditional replacements are *ae", 'oe**, W and "ss", respectivety In rare cases, for example internet 
20 web and email addresses^ It has become accepted use to also use "a", V, V Instead of 'ft", "O" In both tower and 
upper case. The special characters used in German are associated with needs that In the case of text ir^ut are In part 
similar to Italian. Unlike Italian, however, these characters appear more frequently, appear In the mkldle of words, and 
are more difficult to be determined. System 100 accommodates inputting these characters when they are not present 
on the keyt>oard. 

25 [0131] System 1 00 utilizes different optional ways to enter the special characters for German. In one embodiment 
a keyboard hook function is utiOzed that biteroepts the combinations of Alt -i- a, Alt -i- o, Alt 4 u, Alt + A, Alt 4- O, Ait + U, 
Ait 4- 8 and Alt -¥ S, and changes the output to ft, 0, 0, A, 6, 0, B, SS. The activator event Is set to Alt by default In one 
embodiment but could be changed In other embodiments. Afternadvety, Input of German characters Is pos8ft}!e through 
IME loops, E after A, O or U results In the output to be changed to A, 0, 0. When E is pressed again, the less frequent 

30 AE, OE, UE pairs are produced. When E Is pressed again, the very rare AE. dE, UE pairs as In "Europder and "BOe" 
are generated. If E is pressed again, the loop begins again from A, 6, 0. This applies both to upper and to lower case. 
If the case of E Is different than that of the first letter, the case of the first letter is the one that is applied to the output 
making it easier to write initials with umlauts. Options fbr different behavior are provkfed. 

(0132) In lower case only, an IME loop that produces 6 and other character combinations is activated by repeated 
35 presses of the 'a' key, as follows: s, ss, 6, 0s. sss. When the V key is pressed a sbrth time, the loop starts again from 
the simple V, and so on. The order of the steps In this loop is based on frequency. Especially after the writing reform 
(•Rechtschreibrefonrf ) approved In 1996, 'ss" is more frequent than U". Both "Bs" (as In ^GroBschreibung*) and •sss" 
(as in "Russsand*^ are rare, but posstole. The default Initial output of system 1 00 Is opttonalty made dynamto based on 
rules as with Italian. For example, system 100 automatically converts AE, OE and UE pairs to the respective vowel with 
40 an umlaut as appropriate. 

Sample Implementatton: Processing of Spanish Text Input 

[0133] Spanish uses the special characters A, ^, 1, 6, U, 0, N, both in lower and in upper case appearing anywhere 
45 In the word as well as the special punctuation signs 'i" and 'i*. The apostrophe character is used for single quotes and 
non-Spanish patterns such as POSTAPOSTROPIHE. Thus system 1 00 can be adapted to a system similar to an Italian 
embodiment where the rules for each vowel consist of simple fallback rules with an ACUTE flag so that after the apos- 
trophe is pressed following a vowel, the result is the vowel with an acute accent. For consistency, the apostrophe is also 
t>e used to place the tilde on top of the N. The IME loop for the vowels toggles between all possible signs, as for Italian, 
so or alternatively between the acute accent, the umlaut optionally for the letter U and the vowd followed by an apostrophe. 
For the N, the IME loop toggles between the N with tilcte and the N followed by an apostrophe. The special signs *i* 
and "i* are generated via an IME k>op that produces the special character when "?* or 1", respectively, are pressed an 
even number of limes. Such an emirodlment simplifies the writing of Spanish using a non-Spanish keyboard where cur- 
rently different combinations of Ctrl, Aft, Shift, Alt + digits or other difficult to enter and to remember keyboard 
55 sequences are used depending on operating system 128 and applk^atron 130. When people write with a pen the sign 
Is placed after writing the vowel and not before. As with Italian, system 1 00 described makes the input of Spanish lntu> 
Itive for keyboard Input on a keyboard wHhout the Spanish characters. 
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Sample Implementation: Processing of international Text Input 

[0134] In one embodiment IME loops are used to generate all variations of a certain ctiaracter when that character 
Is input a certain number of times, for example pressing a certain key two, three times or more to Initiate the output of 
spedai characters rather than the some letter repeated two, three or more times. In an altematJve embodiment, system 
100 initiates an IME loop when a certain chamcter Is pressed in combination with a qualifier key such as Alt For exam- 
ple, repeated presses of Alt 4- A produce all the variants of A with various diacrltteal signs. 

[0135] In one particular embodiment a simpler approach of system 100 is provided tjy combining the loops for Ital- 
ian with a combination of Alt + letter filters which for example produce ^fi* when Alt + n Is pressed (Spanish character), 
•6" when Alt + s Is pressed (German character), V when Alt -i- c Is pressed (French character), etc., and optionally 
inserting the vowel -i- umlaut (German character) step tn the IME loops for Italian. For languages such as Greek where 
certain letters may have different shapes depending on the position in the word for example sIgma at the end of word 
or beta at the beginning of a word, system 100 places differently shaped characters based on the context. At the begin- 
ning and in the middle of words this Is done immediately, whereas conditions requiring different handling at the end of 
a word are processed retroactively, as soon as a non-word character is entered. Each time characters are removed or 
added from or to the beginning or and of a word, the procedure dynamically applies the required changes in order to 
keep the Initial b^ or the ending sigma correct 

Sample Implementation: Input of Currency Symbols 

(0136] In one embodlnr^ent of the Invention, system 1 00 Includes an option to place the Euro sign In a first or second 
position based on user choice in the IME loop associated to keys such as $ or £. For example, presdng the dollar key 
once produces the dollar sign, and twice It produces a Euro sign, or vice-versa More complex loops generating a wider 
variety of currency symbols are associated either to an exisfing cun^ency key or to an otherwise unused key such as the 
backslash (^"). As consklered for the Input of Italian, the order In which the currency signs appear In the IME k>op is 
changed dynamically in one embodlnrtent. System 100 autonnatk^alty outputs the most frequently-used cunrencysyn^i 
when a certain key Is pressed the first time, and then in order of frequency of use produce IME steps when the key is 
pressed again. Altematlveiy system 1 00 maintains the first character output constant, for example to be Identical with 
the character nomnally associated to the keyboard key, and to affect the output of the following characters such as when 
that key is pressed mora than once. The dynamk:s by which the order of the IME steps changes is controlled by param- 
eters lndk»tlng. for example by how much, percentage or absolute value, a certain key becomes more frequent than 
another one before it takes its position In the loop, and whether a change in order requires a certain number of consec- 
utive hits by a character before it Is considered for a higher position In the IME bop. 

Additional Options and Variations 

[0137] ConsMering that system 100 determines the relationship between 7-blt input characters and their accented 
fomns, and converts between the two by applying different techniques and conslderattons, an optional Implementation 
of system 100 Is where system 100 is directly interfaced with contexts where the user enters an Internet Uniform 
Resource Locator (URL) commonly referred to as an Internet address, and by detenrtinlng which characters are 
acceptable In the URL string, It converts Illegal or prohibited signs to legal signs by applying different variations If more 
than one acceptable variation exists. For example, Internet World Wide Web addresses can be recognized because 
they begin whh prefixes like 'Yittp://" or 'httpsyA*. In the domain and host names which make up such web addresses, it 
is currently not allowed to Include any character other than letters from A to Z without accents, digits and the minus sign 
(hyphen). If the user, remembering a company name such as *MOIIer*, entered a web address of ^vww.mt»Ier.com^ the 
browser only atlen^ts to connect to the server with the name exactiy as tyf^ed. whfeh results in a failure or em>r 
because such a domain registration Is not even possible. System 1 00 as described here may first attempt to connect to 
www.mQller.de, but if that filled, system 1 00 in turn attempts to connect to www.nHiePer.de, or www.mulierde. or both, 
In a desired order until a connection succeeds. If the domain or host names contained more than one special character, 
they are in a similar fashion converted to characters th at are acceptable for the type of URL being entered, first anempt^ 
Ing expansion, and optionally stripping of accents, and then optionally combinations thereof). Slmiiariy for the special 
signs of Italian, the accents are removed without replacing them with apostrophe signs and also removing any existing 
apostrophe signs that may have been entered. The same occurs for the special signs of French and Spanish, leaving 
plain letters A to Z. Any syntax that at the time of coding of system 1 00 is known to be invalkJ Is optionally still attempted, 
either as a first try, or as a last attempt, with consideration to the fact that it is likely that special characters will In the 
future become acceptable even In domain and host names. 

[01 38] As for the activator events for Italian and other comparable languages, Instead of or In addition to the already 
mentk)ned accented vowels, and the apostrophe and simitar keys, the activator event msy include the repeated press 
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of a vowel key since vowels almost never appear more than once In a row In Italian, or the actbn of holcflng a key 
pressed for longer than a certain predefined amount of time. For example, enterhg •penchee", or entering •pcrche" and 
holding down the last key a little bit k)nger than usual automatically results In 'V^rch6^ Pressing V again or holding it 
down even longer Initiates an IME loop theret^y proposing further possible signs. A similar technk)ue Is apptied to spe- 

5 dal signs such as opening and dosing double quotes. For example, system 1 00 associates an IME loop to the double- 
quote character wtth the first press generating a f^aln double quote, and repeated presses producing either opening 
and dosing double quotes in a predkrtable constant order, or with the entire order based on frequency and context con- 
sideratlons. Alternatively, IME loop funcdonalhy Is Implemented in such a way such that the first Input of the double- 
quote character Is nominally autonnatlcally changed to the correct opening or closing double quote, but subsequent 

10 repeated presses restore the original manual Input or loop through the other possible related characters. 

[0139] The embodknent of system 100 providing the option of singly holding a key pressed is applk^able to all 
embodiments including to all cases where activator events as well as the repeated Input of a certain character are uti- 
lized lor activating a certain event such as a conBCtion» a re-oon^ectlon, or further modiflcfitlon, or the initiation or next 
step of an IME loop. Such an action of holding down a key for a predetermined duration opdonady provides a special 

15 meaning for some characters and contexts depending on the implementation, and can still be considered as a more tra- 
ditional auto repeat for other keys or contexts. As for the meanings that can be associated to certain Input patterns, 
including holding a key dowa repeated press of the same key, with or without consideration to context, system 1 00 con- 
siders and applies an activator event, for example to place an Italian sign, overriding a previous automatk: connection, 
Inttiafion or continuation of an IME loop, and requests to display some type of information, for example linguistic help. 

so These are optionally Implemented in any combination. 

[0140] System 1 00 optionally Indudes a mvnber of options to temporarily disable all or part of Its acttons. This may 
t>e done for example t^y assigning a certain key or sequence of keys to the temporary tuming off of system 100 which 
In one emt>odiment is limited to the next character, or to all characters until the same certain key or sequence of keys, 
or another key or sequence of keys Is Input meaning that system 100 continues operatioa Certain keys which on most 

25 keyboards have a status Indicator light such as Scroll Lock can also be used. In whk:h case the light becomes a useful 
indicator of system status. Another option to disable system 100 and allow for unmodified input In one embodiment is 
to consider certain qualifier keys which when held down during input of other text cause system 100 not to take any 
action. Another additional option to allow for unmodified input Is to not take any action If the user explicitly resorted to 
an Alt 4- (0) -4- number combination, or another keyboard sequence which by delautt is used on system 100 to Input cer- 

30 tain charaders even If they ere not present on the keyboard. 

{0141 ] The actions that cause certain events to occur in system 1 00 are based in one embodiment on the analysis 
of context, for example consideration to pending single quotes, or in German the repeated press of V after V, etc., 
and RngulsUc and Vterary factors such as the likelihood that certain text patterns, for example double or triple vowels* 
quotes, or currency symbol, appear as part of the tradittonal text flow or not System 1 00 uses certain charaders both 

35 in a normal and in a special. In a partk^lar embodiment of system 1 00, a default Implementatton applied to Italian con- 
sists of the same set of activator events serving dual purposes of allowing for the input of special characters and also 
of correcting certain common accent and a)>ostrophe em>rs. 

[0142] One embodiment to handle unknown words for languages such as Italian where accent rules have a strong 
focus on the word suffix Is where system 100 considers the best accent for an unknown word based on the longest 

40 matching suffix of other word entries and optionally suffix entries in the list of rules. As the number of entries In the rules 
increases, such an embodonent produces better results when applied to unknown words than the fallback rule and in 
some cases even suffix rules. Furthermore, as an additional option In a case where a previous check does not produce 
any matches, for the purpose of matching the suffix of the unknown word with the suffix of existing entries, system 100, 
Witt) exception to the last vowel, considers certain sets of letters to be identical, that is to count as a match. For example, 

4S all vowels are considered as a universal vowel matching character, so that "ahime" matches "ohime". To further deal 
with unknown words an optimal manner, system 100 provides different opttons to extend Its dk:tk)nary of rules. One 
ennbodiment of system 1 00 provides for the rules to be updated from the Internet l^y loading a new set o1 pubHshed rules 
and through the user Interface where the user may add, edit or remove individual rule entries. Optionally, system 100 
either automatically adds to the rules, or modifies a rule entry If rt exists In a different format, instances where the user 

so changed, either with the IME functionality or by re-wrtrlng, or by temporarily switching off system 1 00, the output of sys- 
tem 100, generating a word that system 10 othenvlse changed. Optionally, this step of applying the change to the set 
of rules is semi-automatic, not user initiated but using user confirmation. This embodiment also learns. new POSTA- 
POSTROPHE words or removes them from the list. Optionally, the sot of rules includes some flags that are consWered 
or not based on user Interface settings, which detemilne whether certain ambiguous entries require an explicit chok:e 

55 by the user rather than system 1 00 proposing a certain initial output without further adion. 

[0143] A further operating mode of system 100 when applied to file or dipboard data, system 100 automatically 
detects certain character set enrors whteh result in wrong characters appearing In place of accented letters. To accom- 
plish this, system 1 00 uses a series of lists each associated to a common, known, transmission or character set prob- 



33 



EP 1093 058 A1 



lem, for example a 7-bit national character eet used Instead of en B-bit one, the eights bit being stripped, a character 
set of one system used In the context of a different system, etc. Entries In these lists are used as activator events equiv- 
alent to. for example, accented vowels that ere normally used where the conect character set Is used. The replacement 
list )s selected either manually, or automaHcally, applying all lists to the same text, end then the one list that resulted In 
5 the text containing fewer unknown words is selected based on either a speOIng checker dictionary or on the accent rule 
entries. 

[0144] In one embodiment of system 1 00 applied during typing, a small symbol on the screen changes color as a 
function of the reflability of the rule that was applied, for example ranking word entries higher than suffix entries, and 
entrfes with only one apostrophe or accerrt flag as less ambiguous than entries with mult^e such flags, and the com- 
10 pleteness of the cun^ent text context data where only one character of available text context data fbr exenple causes 
the cotor to appear as a yellow or orange warning. One variation of the rules system Is that entrtes for accented and 
unaccented verbal fomns need not include aO possible variations as a static database, but mther consider that Italian 
uses about 1 1 0 cleerly-deflned verb categories, each with Its known derived fbmns. to aigorlthmk^alfy generate only the 
required verb forms when necessary. 

15 

List of Replacement Rutes 

[0145] If the word to be replaced has an accent or apostrophe, then It is not placed in the list of replacement rules, 
but instead the appropriate COMPl£X attrbute In one of the prevbus lists is used, le.. five vowel lists, or consonant 

20 list If system 100 Is applied to ancient Italian, or to some cunent Italian dialects such as in the Rome or Rorence 
regtons where words are often biansformed to a truncated form using an apostrophe, then In an alternative embodiment 
the fiBllback rules are set for the voweMists to APOSTROPHE entries rather than GRAVE or ACUTE. Thus, the list of 
accented words are complete and «chausi1ve because all words with a sign that do not match any rule would be output 
as words v^h an apostrophe rather than words with an accent In ancient Italian, and In some of today's reglonHl dla- 

25 lects, It is easier to define accented words rather than words with an apostrophe. In modem Italian, words with an apos< 
trophe are more limited so in one embodiment these are considered as exceptions from accent rules. In modem ItaDan 
accented words are also easily defined although their number Is higher than that of words ending with an apostrophe. 
A system giving a higher priority on a complete list of accented words may reach a point where, even for mod^ Italian. 
It may be of advantage to use the APOSTROPHE flag for fallback entries. In one particular embodinrwnt of the Invention, 

30 system 1 00 utilizes resource-efficient rules lists, and Is optionally more conservative in reaching conclusions. Thus, sys- 
tem 100 accommodates words that do not match any rule other than a fialback rule. In whtoh case an accent a more 
frequently occurring accent is placed on the word. 

[0146] It Is ooncelvable that, where system 1 00 is adapted for an ancient ItaRan or regional dialects applteatlon, or 
in a context where a second language that makes intense use of apostrophe characters Is frequently used together with 

X Italian, system 100 optionally incorporates a feature similar to the re-correctlon or further modification applied after the 
detection of POSTAPOSTROPHE strings, but generalized to ail apostrophe characters which are InltiaWy converted to 
accents then followed by more text rather than non-word characters. Such an optional variation of system 1 00 is utilized 
If the additional feature to automatically Insert space charecters after words ending with an apostrophe or accent Is not 
active. The list with the replacement rules Is separate from the Fists for words ending with a vowel and that for words 

40 ending with a consonant for reasons of k)gicaf and computational slmpltelty. In a case where a word is not temiinated 
with any activator events, and no other special options are enabled to con'ect. for example weekdays or words that 
should have a final accent or apostrophe but have been written without one, then system 100 checks the list of replace- 
ment rules if these are enabled rather than also the lists with all the entries whbh are necessary for the proper place- 
ment of accents and apostrophe signs. Trigrams, oombtnatk)n8 of three letters, are optionally used instead of bigrams 

45 to further Improve the recognition accuracy at the expense of some additional menx)ry requirements. 

Example Implementation 

[0147] The following is a description of a sample Implementation of a text processing system in accordance with the 
50 present invention. The implementation Is described using a pseudocode type description. Junction points are marked 
using angle brackets to Indteate branch points in the logto flow. Comments begin with double backslash charecters (//), 
and the event loop starts from [Procedure MafnlnputLoop]. 

[Procedure CheckVowelWordSlgns] 

55 

[0148] 

// This procedure gets an input word Cun-entWord, ending with a vowel with or without diacritical sign or apostro> 
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phe, and returns aocent/apostrophe informailon about the word. By definftion, a word Is a string of one or more 
alphabetic letters with or without diacritical signs. In upper or lower case, allowing for hyphen signs inside the word, 
as long as each hyphen is both preceded and followed by at least one alphabetic letter, and allowing for one 
optional apostrophe at the end of the word. 
5 [normalize CurrentWord, converts ail letters to lower case letters without diacrlticai signs, and removing final apos- 
trophe, if present] 

// The above step Is optional^ if we can say that the comparison functions Ignore differences In case and diacritical 
marks 

[K CurrentWord ends wfth letter A, setCun'entLlstto Ust-A] 
10 [ff CunientWord ends with letter E, eet Cun^ntUst to Ust-E] 
[If CunentWord ends with letter I. set CunrentLlst to Ust-I] 
(If CurrentWord ends with letter O, set CurrentList to List-0] 
[If Cun^entWord ends with letter set CurrentUst to LIst-U] 
[UstP&sitfon - beginning of CunentUst] 

T5 

iiuncHon 1} 
10149] 

20 [(Rule at LlstPosition in CurrentUst Is a suffix rule and CunentWord ends with that suffix) OR (Rule at UstPositlon 
in CumentUst Is instead a word rule and it matches CurrentWord)?] If No increase UstPosition and goto Junction 1 . 
// i^ow we have a positive match. In the worst case Ifs the FALi^ACK rule, which is at the end of the first part ot 
the list. 

[Set ReturnRule to Rule at ListPosition in CurrentList] 
25 [If RetumRufe is a suffix mie, then set RetumAttribute to NOT EXHAUSTIVE, else set RetumAttribute to EXHAUS- 
TIVE] 

[If ReturnRule has APOSTROPHERARE flag and cunent program settings indicate to ignore such case, remove 
APOSTROPHE flag from RetumRule] 

[If Rule does not have FALLBACK flag» set LlstPosition to position In CunrentLlst where the rule with FALLBACK flag 
30 Is located] 

[Set ListPosition to next position] 

// now we are at the first rule after FALLBACK, which is either the first entry in the second part of the fist, or we are 
beyond at the end of the fist 

35 (Junction 2) 

[0160] 

[If ListPosition is beyond end of list] Return 'ReturnRule, RetumAttribute" 
40 [{Ruie at ListPosition in CurrentList Is a suffix rule and CunentWord ends with that suffix) OR (Rule at ListPosition 
in CurrentUst is instead a word rule and it matches CurrentWord)?] If No Increase UstPositlon and goto Junction 2. 
// Now we have a positive match In the second part of the list 
[Set ReturnRule to Rule at ListPosition In CurrentList] 

[If ReturnRule has APOSTROPHERARE flag and cunent program settings Indicate to ignore such case, remove 
45 APOSTROPHE flag from ReturnRule] 
[Set RetumAttribute to EXHAUSTIVE] 
Return "RetumRule, RetumAttribute' 

[Procedure CheckConsonantWordSlgns] 

50 

[01511 

// This procedure gets an input word CunentWord. ending with a consonant with or without apostrophe after it, and 
returns accent/apostnophe information about the word. 
55 [normalize CurrentWord, converting ail letters to lower case letters without diacritical signs, and removing final 
apostrophe. If present] 

// The above step is optional rf the comparison functions ignore differences in case and diacritical marks 
[Set Currenti.ist to List-Consonants] 
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ILIstPosWon = beginning of CurrentList) 
(Juncdon 1) 
5 [0152] 

(If LtelPosltlon Is beyond end of list) Return "No entry found" 

[(Rule at ListPosrtton in CurrentList is a suffix rule and CurrentWord ends with that suffix) OR (Rule at ListPosition 
in CurrentList is instead a word rule and it matches CuoenlWord)?] If No Increase UstPosltion and goto «kinctlon 1 . 
10 [Set RetumRuie to Rule at ListPosition In CurrentList] 

' [if RetumRuie has APOSTROPHERARE flag and curent program settings Indicate to Ignore such case, remove 
APOSTROPHE flag from RetumRuie] 

[if ReturnRule is a suffix rule, then set RetumAttribute to NOT EXHAUSTIVE, else set RetumAttrlbute to EXHAUS- 
TIVE] 

IS Return "RetumRuie, RetumAttrlbute" 
[Procedure CheckPostApostrophe] 
[0153] 

[normalize CurrentWord, converting all letters to lower case letters without diacritical signs, and removing final 
apostrophe, if present] 

// TTie at>ove step Is optional )f the comparison functions ignore differences in case and diacntlcal marks 
[Set CurrentList to List-PostApostrophe) 
25 [UstPosMon = beginning of CumentUst] 

(Junction 1) 

[0154] 

30 

[If ListPosition Is beyond end of list] Return TIG" 

[Rule at ListPosition In Cun^entLlst matches CurrentWord?] If No increase ListPosition and goto Junction 1. 
Return -YES- 

35 [Procedure CheckReplacement] 

[01S5] 

[normaTize CurrentWord, converting all letters to lower case tetters without diacritical signs, and removing final 
40 apostrophe, if present] 

//The above step Is optional. If the comparison furictlons Ignore differences In case and diacrftlcaf marks 
(Set CurrentList to List- Replacements] 
(ListPosition = beginning of Cun^entUst] 

45 (Junction 1> 

[0156] 

{If ListPosition is beyond end of list] Return *'No entry found" 
50 (Rule at ListPosition In CurrentList matciies Cun^entWonJ?] if No Increase ListPosition and goto Junction 1. 
[Set ReturnRule to Rule at ListPosition In Cun^entLlst] 
Return "R^umRule" 

// by definition must be of type COMPLEX 
55 [Procedure isitalian] 
[0157] 
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[If the current application supports querying about language In use at cun-ent text Input position, query application 
and return Yes If language Is Italian, and No If not] 
//Different approaches may be undertaken 

[is there a rules fists with exhaustive entries for all Italian words?} If Ytas; check if word appears In vowel/consonants 
5 rules lists (return attribute must be EXHAUSTIVE) and Is not flagged as N0HTAL1AN, and return Yes If If the word 
is found end is Italian, and No if not 

[No rules-lists with exhausth/e word entries? Then apply other algorittim, for example looking up all letter pairs in 
cun-ent word in a bigram table having 1 entries for letter pairs that exist In Italian words, an 0 for letter pairs that are 
not used fn Kalian, and return No if any pair of two consecutive letters in the word produces a 0, or otherwise return 
to Yes] 

Sample bigram table for Italian (real data, but variations are possible to allow tor different levels of toiertince, e.g. 
with more or less consideration towards rare words and patterns, etc.): 

ABCDEFGHIJKLMNOPQRSTUVWXY2 



IS 







fli 11 1 1 ini ODI 111ini111 innm //raw fnr nnlne '^n* tn "tk^** 




R - 


1 i ni 1 onm oni rvii nm nm nonnn//RiM*f fnr noim f^o" tf% "^rr" 

M 1/ 1 1 UwV 1 UU 1 W 1 VAJ 1 vv 1 vvA/ vU// now TOT polia va 10 UZ 




c - 






n - 




20 


c s 


11 1 111 iVlUUl 11 lllllillUlUl 




F = 


10001100100100100100100000 






10001 01 1 1 001 1 1 1 001 001 00000 




H = 


10001 0001 000001 00000000000 




1 = 


11111110100111111111110001 


SS 


J = 


00000000000000000000000000 




K« 


00000000000000000000000000 




L = 


11111110100111111011110001 






1100100O100O1O1 10000100000 




N = 


1O11111O1000011O1O1111OOO1 


30 


o = 


11111110100111110111110001 




P = 


10001 0001 001 001 1 01 1 1 1 00001 




Q = 


000000000000000000001 00000 




Ra 


11111110100111111111110001 




s = 


11101110100110111111110000 
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T = 


1 0001 0001 000101 001 01 1 00000 




U = 


11111110100111110111000001 




V = 


1 0001 0001 000001 001 001 1 0000 




W = 


00000000000000000000000000 




X = 


00000000000000000000000000 


40 


Yo 


oooooooooooooooooooooooooo 




Z = 


10001000100000100000100001 



[Procedure GetNewContext] 
45 [0158] 

//Get (at least) the current word, until the cursor position. If here, then there was no opportunity to create a context 
buffer yet, or it was lost after vertical cursor movement, mouse action or keyboard, menu or other command that 
might have affected the text When requesting or getting context data, system 1 00 attempts to also get insert/over- 
50 strlice, language and pending closing single quote information. If no such data is available, default values are used, 
if however context data was previously available for that input window, then the Insert/Overstrlke setting is pre- 
served, and not reset to a default value. 

[If operating system 126 and cun^ent application 1 30 support querying of context data as part of an application-spe- 
crfic interface, or for purposes of accessibility for disabled users, or as part of an IME interface, or as part of an error 
55 handling interface, or as part of any other interface capable of providing that information, the context is obtained 
from there, and return] 

[If the hardware, operating system 126 or current application 130 (e.g. a word processor) provides a way of directly 
accessing the text buffer memory (RAM), e.g. because the memory region Is constant, or pointers to that region are 
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known, the context Is obtained from there, and return] 

[Locate cursor position and apply OCR to get cun'ent word context, for example using library provided by a remote 
system 266 via network 264. It the screen Is not bitmapped, but character-mode, it Is only necessary to (solata text 
from non-relevant characters. Return If successful.] 
5 Clear local context data and return 
// Fail, there Is no context data 

[Procedure MalnlnptitLoop] 

10 [0159] 

// Main input loop. Shields the teKt*processlng part from a few non-text-stream issues, 
function 1} 

IS 

[0160] 

[Walt for l<eyboarcl or mouse event] 

[Non-character event potentially InvoMng context disruption?) ff Yes, GetNewContext and goto junction 1 
20 // Events that involve context disruption Include: new window; cursor i^down and other cursor positioning keys 

(e.g. Home, End. Pbqb Up, Page Down) other than cursor left^right: mouse ctidc events that cause cursor to be 

repo8ltk>ned; menu selections and k6yt>oard accelerators to menu selections; command-shortcuts, 

[Cursor left or right, or Delete, or Backspace key?] If Yes, update local context tnjffer contents and Insertion posltton 

accordingly, then goto Junction 1 
25 [Insert key?] If Yes, update Insert/overstrike mode status In context data, then goto Junction 1. 

[Does current applk^ation support notification of language and frisert/overstrlke status?] If Yes, read these settings 

again and update them In the local context data. 

[Text character?] If Yes, update local context buffer, then call Textlnpud.oop. If TextlnputLoop applied changes to 
the output stream, update context buffer again and send backspace or cursor-left and new characters as approprl- 
30 ate. 

//Text characters include letters, numbers, space, punctuation, and all other graphfeal characters that produce an 
editing action. 
GotoJunctksn 1 

35 [Procedure TextJnputLoop] 

[0161] 

// Text input \oop, gets called whenever a new character appears In the Input stream. 
40 [Is the character an activator event?] If Yes execute ActivatorCharacter, then Return. 

// depending on program options, activator events can be the apostrophe and similar characters, or an accented 
letter 

[Non-word character?] If Yes execute NonWordCharacter. 
Return 

45 

[Procedure ActlvatorCharacter] 
[0162] 

so U If we are here. In an Italian implementation it means that the user pressed an accented vowel key, or an apostro- 
phe after b word 

[Same position as a previous activator event that caused system action, which was manually charged by user?] if 
yes, Return. 

// Do nothing If something Just happened, and the user changed what was done. If however nothing is done, and 
55 the user again changes the input, then again something is done, because this time the previous time was not such 
that system action was Initiated. The result Is that something is done every second tlnne. 
{Same activator event as prevksus character?] if Yes, execute next step in ImeLoop, then Return. 
[Acute character, or other character equivalent to apostrophe?) If Yes. replace it with apostrophe 
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// note: change In local context data only; later all dlftorencea will be cumulatively appfled to output stream. This 
transformation from acute etc. to apostrophe is an example of many optional things that can be done. 
[Activator event Is accented vowel?] If yes, add it to the current word 

// Here ^stem 1 00 is adding to the local context data. The application already received the character, and what can 
5 be done (later. If neccessary) Is to send a fake backspace (and/or cursor left, If overstrike mode) fbUowed by new 

data (unless backspace ak>ne was suffictent}. 

[Last character of the current word Is a vowel?] If Yes Check VowelWordSTgns else CheckOonsonantWordSigns 
[No RetumRule was found?] If so, Return 

// This can happen if the word ended with a consonant, and input was OK (word not in list), because vowel Hsts pro- 
10 vide fallback rules 

[RetumRule has COMPLEX attribute?) If Yes, replace word with COMPLEX word (If the word was different), and 
Return. 

y/This is an example of actk)n that can be turned on or off by the user, or depending on the implementation 
// Only three cases possible: vowel with accent, vowel with apostrophe, consonant with apostrophe 
f5 [Activator event is on or after vowel?] If Yes 
{ 

(Activator event Is accent?] If yes, ProcessVowelAocent, else ProcessVowelApostrophe 
J 

Else ProcessConsonantApostrophe 
20 Return 

[Pfooedure NonWdrdCharacter] 

[01031 

25 

//checks replacement niles, and POSTAPOSTIHOPHE 

[DM system 100 change user input of apostrophe Immediately before this word?] If yes 
{ 

[CheckPostApostrophe gives POSTAPOSTBOPHE match on current word?} 

If yes, restore prevbiisly changed apostrophe, and Return 
) 

35 

Check Replaceme nt 

[RetumRule has COMPLEX attribute?] If Yes, replace word with COMPLEX word (If the word was diffierent) 
Return 

40 [Procedure ProcesaVowelApoatrophe] 
(01641 

// Handling of Word ending with vowel and followed by apostrophe activator event. Word in this entire subroutine 
45 means word with apostrophe 

// This Is one of the most complex cases, because the Intentton may have been to enter a cfosing single quote (an 
opening single quote wouM not have immediately followed a word, but rather It would have preceded it) 
[RetumRule has APOSTROPHE attribute, and none of GRAVE or ACUTE or CIRCUMFLEX attributes?] If Yes 

so { 

New Word = Word 

[RetumRule also has NOTHING flag?] If Yes. Optionally (based on Implementation and/or settings}: inform 
user via tool tip that the case is ambiguous, and couki k)e resolved In nrwre than one way, addng additional 
ReturnRuie information as appropriate 
55 Return 
1 

//the above is the simplest case: no ambiguity, nothing to correct; we could however issue an optional information 
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message if the rule also had the NOTHING Hag, Most practical embigulttes are however taken care of via APOS- 
TROPHERARE, whJch Is already filtered based on Implementatton/eettings. 
[BeturnRule has APOSTROPHE attribute, and expecting a closing single quote?] If Yes 

5 { 

NewWord= Word 

Optionally (based on implementation and/or settings): Inform user via toot tip that the case is ambiguous, and 
could be resolved in more than one way, adding additional Return Rule information as appropriate 
Return 

10 ) 

// the above nspresents a statistical fact that if the word can be written with an apostrophe, and a dosing quote is 
expected, then it is more likely that the user actually wanted to input en apostrophe, even if the word can also be 
written with an accent [ReturnRule has NOTHING attrtoute, and none of APOSTROPHE. GRAVE or ACUTE or 
IS CIRCUMFLEX attributes?] If Yes 

{ 

[expecting a closing single quote?] If Yes, NewWord = Word 
Else NewWord ^ Word without apostrophe 
20 Return 
) 

// If system 100 Is here, there is APOSTROPHE as well as one or more other accent flags, In which case, based 
on a statistical choice, priority is given to the accent, if only one (which is usually the case), or there are one or more 
26 accent flags, In which case system 1 00 changes the apostmphe Input to an accent output This all means that the 
APOSTROPHE flag can be ignored from here on, as it does not change anything, because the cases In which an 
apostrophe is output have all already been considered {ReturnRule has more than one of NOTHING or GRAVE or 
ACUTE or CIRCUMFLEX attributes?] If Yes 

30 I 

NewWord - Word with first most likely accent (first ftem of sequence used for IME loop for last vowel in word, 
considering only the accent flags In ReturnRule) 

Optionally (based on Implementation and^or settings): inform user via tool tip that the case Is ambiguous, and 
could be resolved in more than one way, adding additional ReturnRule information as appropriate 
35 Retum 
1 

// Note: the above is rare 

// If system 1 00 is here, it means that the word has one and only one of GRAVE or ACUTE or CIRCUMFLEX, plus, 

40 possibly, APOSTROPHE 

NewWord = Word with accent as per single accent flag (GRAVE or ACUTE or CIRCUMFLEX) 
IReturnRule also has APOSTROPHE flag?] Optionally (based on irrH^lementation and/or settings): infomi user via 
tool tip that the case Is ambiguous, and could be resolved in mors than one way, adding additional RetumRute Infor- 
mation as appropriate 

45 Return 

[Procedure Process VowelAccent] 
[0165] 

so 

II Handling of Word ending with vowel input as accented vowel activator event 

// This case Is simpler than vowel 4- apostrophe, because although the accent could be wrong, having to be 
changed to nothing, or to an apostrophe, the case where a closing quote could have been Intended does not apply 
here 

55 [ReturnRule has only one of APOSTROPHE, GRAVE, ACUTE or CIRCUMFLEX?] 
{ 

NewWord = Wonj with sign as specified by APOSTROPHE, GRAVE, ACUTE or CIRCUMFLEX 
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(RetumRule also has NOTHING flag?] If Yes, Optionally (based on Irnpleni6ntatk)n and/or settings): inform 

user via tool tip that the case is ambiguous, and coutd be resolved In more than one way, adding additional 

ReturnRule information as appropriate 

Return 

} 

// It here, the ReturnRule has NOTHING and/or more than one accent/apostrophe flag 
[ReturnRule has more than one of APOSTROPHE, GRAVE, ACUTE or CIRCUIVIFLEX?! 

{ 

NewWord = Word, If compatible with ReturnRule flags, or othenwlse Word with first most likely accent (first Item 
of sequence used for IME loop for test vowel In word, consWering only the accent flags we have In ReturnRule) 
Optionally (based on implementation and/or settings): inform user via tool tip that the case Is ambiguous, and 
could be resolved in more than one way, adding additional ReturnRule Information as appropriate 
Retum 

) 

// If here, the ReturnRule has NOTHING flag and no accent or apostrophe flag 

New Word = Word without accent 

Return 

(Procedure ProcessConaonantApostrophe] 
[0166] 

// Heuidnng of Word ending with consonant and apostrophe acdvator event 

// Here a dosing quote could be expected, but the nun^r of possibintles for the word itsetl are only two: apostro- 
phe or no apostrophe (consonants do not have accents) 
[ReturnRule has APOSTROPHE attribute?] If Yes 

{ 

NewWord » Word 
Return 

} 

// if here, the ReturnRule has a NOTHING flag and no APOSTROPHE 

[Expecting a closing single quote?] If Yes 

{ 

NewWoni = Word 

Return 
) 

// Note: in cases Ki<e the above, if In a very demanding editorial context, system 1 00 ensures that the apostrophe 
found here after a word with NOTHING and no APOSTROPHE flag actually was the closing quote being looking 
for. rather than a mistake. In such a demanding context, appropriate Infbrmatlon messages are optksnalfy used. 
NewWord = Word without apostrophe, followed by space 
Return 

// Note: in case this space is followed by a punctuation sign, system 100 optionally re-corrects of further modifies 
the automatically inserted space, removing it Optionally, system 100 does not even add a space character in the 
first place, 

[0167] It is believed that the method and apparatus for processing text and character data of the present invention 
and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that vari- 
ous changes may be made in the fomn, construction and amangement of the components thereof without departing 
from the scope and spirit of the invention or without sacrificing all of Its material advantages. The form herein before 
described being merely an explanatory embodiment thereof. It is the Intention of the foltowlng claims to encompass and 
include such changes. The inventk>n described herein need not implement or require any one particular or all of the 
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embodiments or parts thereof: Indeed a system, hardware, or software, may optionally Implement any one or more of 
the embodiments described herein, in whole or In part, at an times or leee than at all times, and without requiring any 
one or more remaining embodiments thereof, fn whole or la part, without departing from the spirit or scope of the Inven- 
tion and wfthout provlcQng substantial change thereta For example, a system may be optimized tor Italian text process- 

5 Ing with or without using Gennan text processing, or aRernattvely a system may be optimized for German text 
processing with or without Itailan text processing. Furthermore, a system Implementing text processing in accordance 
with the present Invention may be optimized for processing one level or fomielity of text, for example for newspaper or 
newsprint text, or alternalivety may be optimized for another level or formaOty of text, such as scientific literature, or 
alternatively may be optimized for another tevel of formality of text, such as popular fiction, without Implementing or 

10 being optimized for other levels or formalities of text as determined by requirements and the desired level or fbrmallty 
of processing, and without departing from the scope or spirit of the invention and writhout pro\^ding substantial change 
thereto. 

Claims 

15 

1. An apparatus, comprising: 

means for receiving input text; 

means for detecting an activator event in ttie Input text; end 
so means for modifying a word In the Input text In response to said detecting means detecting an activator event. 

2. An apparatus as claimed in claim 1 , wherein the activator event Includes actuation of a predetermined key of a key- 
board and/or follows actuation of another predetermined key of said keyboard. 

25 3. An apparatus as claimed In claim 2, the predetemilned key of a keyboard being at least one of an apostrophe key, 
a vowel key, a curency key. an accented letter key, and a punctuation key. 

4. An apparatus as daimed In cialm 1, further comprising means tor detecting a subsequent activator event in the 
input text in succession from a first detected actuator event. 

30 

5. An apparatus as claimed In datm 1 , wherein characters not included In a character set of the input text are encoded 
using the character set for the text input 

6. An apparatus as claimed In claim 1 , wherein said detecting means detects an activator event based upon a context 
35 of the Input text. 

7. An apparatus as claimed in claim 1 , wherein said modifying means selects a modification of the word based upon 
a frequency of occurrence of available word modlfteations. 

40 8. An apparatus as claimed in claim 1. wherein said modifying means selects a nnodification of the word based upon 
grammar rules of a language of the Input text 

9. An apparatus as claimed In claim 1 , sad modifying means selecting a moditteation based upon a prevnus word 
modification selected by a user. 

4S 

10. An apparatus as claimed In claim 1, said modifying means provkiing optimal accenting and punctuation of a word 
in the text Input. 

11. A computer readable medium tangibly embodying computer readable code stored thereon for implementing a 
60 method for processing text, the method comprising: 

receiving input text; 

detecting an activator event In the input text; and 

modifying a word in the input text in response to said detecting means detecting an activator event. 

55 

12. An apparatus, comprising: 

means for receiving input text: 
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means for detecting an apostrophe in the Input text; 

means for Initiating an Input method editor loop In re^nse to said detecting means detecting an apostrophe 
in the input text; 

and means 1^ modifying a word in the Input text based upon a word modification contained in the Input editor 
loop. 

13. An apparatus as claimed In claim 12, the input method editor loop containing a hierarchy of word modifications in 

a predetemiined hierarchy, 

14. An apparatus as claimed in claim 13, said modifying means providing successive modifications of the word upon 
successive apostrophes detected by said detecting nneans. 

15. An apparatus as claimed In ctalm 14, wherein the successive modifications provided by said modifying means are 
Implemented according to a hierard^ of word modifications in the input method editor loop. 

16. An apparatus as clalnred In claim 12, wherein the input method editor loop Includes a hierarchy of word modifica- 
tions In an order determined by Italian language rules and^or frequency of word modifications in the loop. 

17. An apparatus as claimed in claim 12. said modifying means modifying the word to provide en optlmafiy accented 
form of the word without requiring a user to select an accented torn of the word. 
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